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Preface 


This new edition of my algebra textbook has a number of changes. 

The most significant is that the book now tries to live up to its title better 
than it did in the previous edition: the introductory chapter has more than 
doubled in length, including basic material on proofs, numbers, algebraic manip- 
ulations, sets, functions, relations, matrices, and permutations. I hope that it is 
now accessible to a first-year mathematics undergraduate, and suitable for use 
in a first-year mathematics course. Indeed, much of this material comes from 
a course (also with the title ‘Introduction to Algebra’) which I gave at Queen 
Mary, University of London, in spring 2007. 

I have also revised and corrected the rest of the book, while keeping the 
structure intact. In particular, the pace of the first chapter is quite gentle; in 
Chapters 2 and 3 it picks up a bit, and in the later chapters it is a bit faster 
again. Once you are used to the way I write mathematics, you should be able 
to take this in your stride. Since the book is intended to be used in a variety of 
courses, there is a certain amount of repetition. For example, concepts or results 
introduced in exercises may be dealt with later in the main text. New material 
on the Axiom of Choice, p-groups, and local rings has been added, and there are 
many new exercises. 

Iam grateful to many people who have helped me. First and foremost, Robin 
Chapman, for spotting many misprints and making many suggestions; and Csaba 
Szabo, who encouraged his students (named below) to proofread the book very 
thoroughly! Also, Gary McGuire spotted a gap in the proof of the Fundamental 
Theorem of Galois Theory, and R. A. Bailey suggested a different proof of Sylow’s 
Theorem. The people who notified me of errors in the book, or who suggested 
improvements (as well as the above) are Laura Alexander, Richard Anderson, 
M. Q. Baig, Steve DiMauro, Karl Fedje, Emily Ford, Roderick Foreman, Will 
Funk, Rippon Gupta, Matt Harvey, Jessica Hubbs, Young-Han Kim, Bill Mar- 
tin, William H. Millerd, Ioannis Pantelidakis, Brandon Peden, Nayim Rashid, 
Elizabeth Rothwell, Ben Rubin, and Amjad Tuffaha; my thanks to all of you, 
and to anyone else whose name I have inadvertently omitted! 


P.J.C. 
London 
April 2007 


Preface to the first edition 


The axiomatic method is characteristic of modern mathematics. By making our 
assumptions explicit, we reduce the risk of making an error in our reasoning based 
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on false analogy; and our results have a clearly defined area of applicability which 
is as wide as possible (any situation in which the axioms hold). 

However, switching quickly from the concrete to the abstract makes a heavy 
demand on students. The axiomatic style of mathematics is usually met first in a 
course with a title such as ‘Abstract Algebra’, ‘Algebraic Structures’, or ‘Groups, 
Rings and Fields’. Students who are used to factorising a particular integer or 
finding the stationary points of a particular curve find it hard to verify that a 
set whose elements are subsets of another set satisfies the axioms for a group, 
and even harder to get a feel for what such a group really looks like. 

For this reason, among others, I have chosen to treat rings before groups, 
although they are logically more complicated. Everyone is familiar with the 
set of integers, and can see that it satisfies the axioms for a ring. In the early 
stages, when one depends on precedent, the integers form a fairly reliable guide. 
Also, the abstract factorisation theorems of ring theory lead to proofs of impor- 
tant and subtle properties of the integers, such as the Fundamental Theorem 
of Arithmetic. Finally, the path to non-trivial applications is shorter from ring 
theory than from group theory. 

I have been teaching algebra for the whole of my professional career, and 
this book reflects that experience. Most immediately, it grew out of the Abstract 
Algebra course at Queen Mary and Westfield College. Chapters 2 and 3 are based 
fairly directly on the course content, and provide an introduction to rings (and 
fields) and to groups. The first chapter contains essential background material 
that every student of mathematics should know, and which can certainly stand 
repetition. (A great deal of algebra depends on the concept of an equivalence 
relation.) 

Chapter 4, on vector spaces, does not try to be a complete account, since 
most students would have met vector spaces before they reach this point. The 
purpose is twofold: to give an axiomatic approach; and to provide material in 
a form which generalises to modules over Euclidean rings, from where two very 
important applications (finitely generated abelian groups and canonical forms of 
matrices) come. 

Chapter 7 carries further the material of Chapters 2 and 3, and also intro- 
duces some other types of algebra, chosen for their unifying features: universal 
algebra, lattices, and categories. This follows a chapter in which the number sys- 
tems are defined (so that our earlier trust that the integers form a ring can be 
firmly founded), the distinction between algebraic and transcendental numbers 
is established, and certain ruler-and-compass construction problems are shown 
to be impossible. The final chapter treats two important applications, drawing 
on much of what has gone before: coding theory and Galois Theory. 

As mentioned earlier, Chapters 2 and 3 can form the basis of a first course on 
algebra, followed by a course based on Chapters 5 and 7. Alternatively, Chapter 3 
and Sections 7.1—7.8 could form a group theory course, and Chapters 2 and 5 and 
Sections 7.9-7.14 a ring theory course. Sections 2.14—2.16, 6.6—6.8, 7.15—7.18, and 
8.6-8.11 make up a Galois Theory course. Sections 6.1-6.5, and 6.9-6.10 could 
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supplement a course on set theory, and Sections 2.14—2.16, 7.15-7.18, and 8.1- 
8.5 could be used in conjunction with some material on information theory for 
a coding theory course. 

Some parts of the book (Sections 7.8, 7.13, and probably the last part of 
Chapter 7) are really too sketchy to be used for teaching a course; they are 
designed to tempt students into further exploration. 

At the end, there is a list of books for further reading, and solutions to 
selected exercises from the first three chapters. 

Asterisks denote harder exercises. 

There is a World Wide Web site associated with this book. It contains solu- 
tions to the remaining exercises, further topics, problems, and links to other sites 
of interest to algebraists. The address is 


http: //www.maths.qmul.ac.uk/~pjc/algebra/ 


Thanks are due to many generations of students, whose questions and per- 
plexities have helped me clarify my ideas and so resulted in a better book than 
I might otherwise have written. 


P.J.C. 
London 


December 1997 
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1 = Introduction 


The purpose of this chapter is to introduce you to some of the notation and ideas 
that make up mathematics. Much of this may be familiar to you when you begin 
the study of abstract algebra. But, if it is not, I have tried to provide a friendly 
introduction. Your job is to practice unfamiliar skills until you are fluent. If you 
do not feel confident, please read this chapter carefully. 

Much more than most scholarly disciplines, mathematics is structured; each 
subject assumes knowledge of its prerequisites and builds on them. But nobody 
studies mathematics starting with the logical foundations and working upwards. 
My view of the subject is more like a building which has basements and attics, 
but where you enter at the ground floor, with the knowledge you already have; 
then you can go upstairs to the applications or down to the foundations as you 
please. 

This chapter, after a brief discussion of the structure and symbolism of 
mathematics, proceeds to give accounts of the topics which make up the com- 
mon language of mathematicians: numbers, sets, functions, relations, formulae, 
equations, matrices, and logic. Much of the material comes back later in more 
serious and rigorous form. For example, in the first section, I will prove two 
famous theorems from Greek mathematics, about the infinitude of the primes and 
the irrationality of the square root of 2, even though numbers are not discussed 
until the second section. 


What is mathematics? 


Mathematics is not best learned passively; you don’t sop it up like 
a romance novel. You’ve got to go out to it, aggressive, and alert, 
like a chess master pursuing checkmate. 


Robert Kanigel (1991). 


No one would doubt that a mathematics book is not like a novel. It is full 
of formulae using strange symbols and Greek letters, and contains words like 
‘theorem’, ‘proposition’, ‘lemma’, ‘corollary’, ‘proof’, and ‘conjecture’. Many of 
these words are themselves Greek in origin. 

This is the legacy of Pythagoras, who was probably the first mathematician in 
anything like the modern sense (as opposed so somebody who used mathematics, 
such as a surveyor or an accountant). We know little about Pythagoras, and 
what we do know is unreliable, but it is clear that he cared very deeply about 
the subject: 
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the word ‘theory’ ... was originally an Orphic word, which 
Cornford interprets as ‘passionate sympathetic contemplation’ ... 
For Pythagoras, the ‘passionate sympathetic contemplation’ was 
intellectual, and issued in mathematical knowledge ... To those 
who have reluctantly learnt a little mathematics in school this 
may seem strange; but to those who have experienced the intox- 
icating delight of sudden understanding that mathematics gives, 
from time to time, to those who love it, the Pythagorean view will 
seem completely natural ... 


Bertrand Russell (1961). 


1.1 Notations. The most important thing about mathematics is that the 
assertions we make have to have proofs; in other words, we must be able to 
produce a logical argument which cannot be attacked or refuted. We will see 
many proofs; the next section contains two classics from the ancient Greeks. 

The words ‘Theorem’, ‘Proposition’, ‘Lemma’, and ‘Corollary’ all have 
the same meaning: a statement which has been proved, and has thereby 
become part of the body of mathematics. There are shades of difference: a 
theorem is an important statement; a proposition is one which is less impor- 
tant; a lemma has no importance of its own but is a stepping stone on the 
way to a theorem; and a corollary is something which follows easily from a 
theorem. 

The word ‘Proof’ indicates that the argument establishing a theorem (or other 
statement) will follow. The end of the argument is marked by the special symbol 
. If an exercise asks you to ‘prove’, ‘show’, or ‘demonstrate’ some statement, 
you are being asked to construct a proof yourself. 

A ‘Conjecture’ is a statement which is believed to be true but for which 
we do not yet have a proof. Much of what mathematicians do is working to 
establish a conjecture (or, since not all conjectures turn out to be true, to refute 
one). Another important part of our work is to make conjectures based on our 
experience and intuition, for others to prove or disprove. (The great twentieth- 
century Hungarian mathematician Paul Erdos said, ‘The aim of life is to prove 
and to conjecture.’) 

Mathematicians have not always been consistent about applying these terms. 
Sometimes it happens that a result which first appeared as a lemma came to be 
regarded as more important than the theorem it was originally used to prove. 
(See Gauss’ Lemma in Chapter 2 for an example. One result in Chapter 6, Zorn’s 
Lemma, is really an axiom!) Also, one of the most famous conjectures (until 
recently) was ‘Fermat’s Last Theorem’, which asserted that there cannot exist 
natural numbers x, y,2z,n with x,y,z > 0 and n > 2 such that 2” + y” = 2”. 
Fermat asserted this theorem and claimed to have a proof, but no proof was 
found among his papers and it is now believed that he was mistaken in thinking 
he had one; but the name stuck. The conjecture was proved by Wiles in the 
1990s, but we still call it ‘Fermat’s Last Theorem’ rather than ‘Wiles’ Theorem’. 
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A ‘Definition’ is a precise way of saying what a word means in the math- 
ematical context. Here is Humpty Dumpty’s view (in the words of Lewis 
Carroll): 


When I use a word, it means exactly what I want it to mean, 
neither more nor less. 


In mathematics, we use a lot of words with very precise meanings, often quite 
different from their usual meanings. When we introduce a word which is to have 
a special meaning, we have to say precisely what that meaning is to be. Usually, 
the word being defined is written in italics. For example, you may meet the 
definition: 


An m xX n matrix is an array of numbers set out in m rows and 
n columns. 


From that point, whenever you come upon the word ‘matrix’, it has this 
meaning, and has no relation to the meanings of the word in geology, in medicine, 
and in science fiction. 

Most of the specialised notation in mathematics will be introduced as we go 
along. Because we use so many symbols in our arguments, one alphabet is not 
enough, and letters from the Greek alphabet are often called on. Table 1.1 shows 
the Greek alphabet. 

Other alphabets including Hebrew and Chinese have been used on occasion 
too. 

Another specialised alphabet is ‘blackboard bold’: 


Nest 


ABC 


EFGHIJKLMNOPQRSTUVWXYZ. 


This alphabet originated because, in print, mathematicians can use bold type 
for special purposes, but bold type is difficult to reproduce on the blackboard 
with a piece of chalk. These letters are typically used for number systems: 


e N for the natural numbers 1, 2,3,... 

e Z for the integers ...,—2,—1,0,1,2,... 

e Q for the rational numbers or fractions such as 3/2 

- R for the real numbers, including /2 and 7+ 

e C for the complex numbers, including i (the square root of —1). 


Most of these letters are self-explanatory, but why Z and Q? The German word 
for numbers is Zahlen, which gives us the Z. The rational numbers cannot be R, 
so remember Q for quotients. 


1.2. Proofs. The real answer to our earlier question ‘What is mathemat- 
ics?’ is: Mathematics is about proofs. A proof is nothing but an argument to 
convince you of the truth of some assertion. Mathematical statements require 
proofs, which should be completely convincing, though you might have to work 
to understand the details. If, after a lot of effort, you are not convinced by an 
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Table 1.1 The Greek alphabet 


Name Capital Lowercase 


alpha 
beta 
gamma 
delta 
epsilon 
zeta 
eta 
theta 
iota 
kappa 
lambda 
mu 

nu 

xi 
omicron 
pi 

rho 
sigma 
tau 
upsilon 
phi 

chi 

psi 
omega 
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argument, then either the author has not made it clear, or the argument is not 
correct. 

The proofs should ultimately be founded on logic; but we will not be too 
precise now about what constitutes a logically valid argument. 

Here are two fine examples of proofs, from the time of ancient Greek 
mathematics. In each case, the statement is not at all obvious, but the proof 
persuades you that it must be true. In each case, the strategy is what we call 
‘proof by contradiction’: that is, we show that assuming the opposite of what we 
are trying to prove leads to an absurdity or contradiction. Also, in each case, the 
proof has an ingenious twist. 

The first theorem, probably due to Euclid, states that the series of prime 
numbers goes on for ever; there is no largest prime number. (A prime number 
is a natural number p greater than 1 which is not divisible by any natural numbers 
except for itself and 1. Notice that this definition says that the number 1 is not 
a prime number, even though it has no divisors except itself and 1. This makes 
sense; we will see the reason later.) 
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Theorem 1.1 There are infinitely many prime numbers. 


Proof Our strategy is to show that the statement must be true because, if we 
assume that it is false, then we are led to an impossibility. 

So we suppose that there are only finitely many primes. Let there be n primes, 
and let them be pj, p2,..., Pn. Now consider the number N = pip2--+ppt+l1. That 
is, N is obtained by multiplying together all the prime numbers and adding 1. 

Now N must have a prime factor (this is a property of natural numbers which 


we will examine further later on). This prime factor must be one of p1,...,Dn 
(since by assumption, these are all the prime numbers). But this is impossible, 
since N leaves a remainder of 1 when it is divided by any of p1,...,Dn. 


Thus our assumption that there are only finitely many primes leads to a 
contradiction, so this assumption must be false; there must be infinitely many 
primes. 


The second theorem was proved by Pythagoras (or possibly one of his stu- 
dents). This theorem is surrounded by legend: supposedly Hipparchos, a disciple 
of Pythagoras, was killed (in a shipwreck) by the gods for revealing the disturbing 
truth that there are ‘irrational’ numbers. 


Theorem 1.2 V2 is irrational; that is, there is no number x = p/q (where p 
and q are whole numbers) such that x? = 2. 


Proof Again the proof is by contradiction. Thus, we assume that there is a 
rational number p/q such that (p/q)? = 2, where p and q are integers. We can 
suppose that the fraction p/q is in its lowest terms; that is, p and q have no 
common factor. 

Now p? = 2q?. Thus, the number p? is even, from which it follows that p 
must be even. (The square of any odd number is odd: for any odd number has 
the form 2m + 1, and its square is (2m + 1)? = 4m(m +1) + 1, which is odd.) 
Let us write p = 2r. Now our equation becomes 4r? = 2q?, or 2r? = q?. Thus, 
just as before, g? is even, and so q is even. 

But if p and gq are both even, then they have the common factor 2, which 
contradicts our assumption that the fraction p/q is in its lowest terms. 


Now we look at a few proof techniques, and introduce some new terms. 


Proof by contradiction We have just seen two examples of this. In order to 
prove a statement P, we assume that P is false, and derive a contradiction from 
this assumption. 


Proof by contrapositive The contrapositive of the statement ‘if P, then 
O’ is the statement ‘if not Q, then not P’. This is logically equivalent to the 
original statement; so we can prove this instead if it is more convenient. 


Converse Do not confuse the contrapositive of a statement with its converse. 
The converse of ‘if P, then Q’ is ‘if QO, then P’. This is not logically equivalent to 
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the original statement. For example, it can be shown that the statement ‘if 2” —1 
is prime, then n is prime’ is true; but its converse, ‘if n is prime, then 2” — 1 is 
prime’ is false: the number n = 11 is prime, but 2!! — 1 = 2047 = 23 x 89. 


This example by Lewis Carroll might help you remember the difference 
between a statement and its converse. 


‘Come, we shall have some fun now!’ thought Alice. ‘I’m glad 
they’ve begun asking riddles.—I believe I can guess that,’ she added 
aloud. 

‘Do you mean that you think you can find out the answer to it?’ 
said the March Hare. 

‘Exactly so,’ said Alice. 

‘Then you should say what you mean,’ the March Hare went on. 
‘I do,’ Alice hastily replied; ‘at least—at least I mean what I say— 
that’s the same thing, you know.’ 

‘Not the same thing a bit!’ said the Hatter. ‘You might just as 
well say that “I see what I eat” is the same thing as “I eat what 
I see”!’ ‘You might just as well say,’ added the March Hare, ‘that 
“T like what I get” is the same thing as “I get what I like”! 

‘You might just as well say,’ added the Dormouse, who seemed to 
be talking in his sleep, ‘that “I breathe when I sleep” is the same 
thing as “I sleep when I breathe”! 

‘It is the same thing with you,’ said the Hatter, and here the con- 
versation dropped, and the party sat silent for a minute, while 
Alice thought over all she could remember about ravens and 
writing-desks, which wasn’t much. 


Counterexample Given a general statement P, to show that P is true it is 
necessary to give a general proof; but to show that P is false, we have to give one 
specific instance in which it fails. Such an instance is called a counterexample. 
In the preceding paragraph, the number n = 11 is a counterexample to the 
general statement ‘if n is prime, then 2” — 1 is prime’. 


Sufficient condition, ‘if’ We say that P is a sufficient condition for Q 
if the truth of P implies the truth of Q; that is, P implies Q. Another way of 
saying the same thing is ‘if P, then Q’, or ‘Q if P’. In symbols, we write P > Q. 


Necessary condition, ‘only if’ We say that P is a necessary condition 
for Q if the truth of P is implied by the truth of Q, that is, Q implies P. (This 
is the converse of the statement that P implies Q.) We also say ‘Q only if P’. 


Necessary and sufficient condition, ‘if and only if’ We say that P is 
a necessary and sufficient condition for Q if both of the above hold, that 
is, each of P and Q implies the other. We also say ‘P if and only if Q’. Note 
that there are two things to prove: that P implies Q, and that Q implies P. In 
symbols, we write P © Q. 
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Proof by induction This is a very important technique for proving things 
about natural numbers. We discuss it later in this chapter. 


1.3. Axioms. In the proofs in the last section, we used various properties of 
numbers: every integer greater than 1 has a prime factor; any number is either 
odd or even; and any fraction can be put into its lowest terms by cancelling 
common factors. Later on in the book we will examine these assumptions. 

The process of examining our hidden assumptions is very important in math- 
ematics. Each assumption should be proved, but the proof will probably involve 
more basic assumptions. There is a sense in which everything can be built from 
nothing using only the processes of logic. Usually this is much too long-winded; 
so instead we start by making our basic assumptions explicit. 

It used to be thought that the basic assumptions of mathematics were true 
statements about the real world. Euclid’s geometry was the model for many 
centuries. Euclid begins with axioms, which he regarded as ‘self-evident truths’, 
and deduced a huge body of theorems from them. But one of his axioms, the 
‘axiom of parallels’, is far from self-evident. Mathematicians tried hard to prove 
it, but eventually were forced to admit that it was possible to construct a kind of 
geometry in which this axiom is false. (This is now referred to as non-Euclidean 
geometry.) 

Now we regard the axioms as starting points which we choose, depending on 
the branch of mathematics we are studying. The theorems we prove will be true 
in any system (including any real-world system) which happens to satisfy the 
axioms. 

One of the advantages of this approach is that, instead of proving theorems 
about, say, the integers, we can prove theorems about ‘principal ideal domains’; 
as long as the integers satisfy the axioms for principal ideal domains, our theo- 
rems will be true in the integers. This is how we shall justify the assumptions of 
the last section about primes and common factors. 

It is very important, however, not to bring in any hidden assumptions. For 
example, if we are doing geometry, the axioms will probably refer to points and 
lines; we must only use properties of points and lines specified in the axioms, 
rather than our commonsense view of how points and lines behave. 

The German mathematician David Hilbert put it like this: 


One must be able at any time to replace ‘points, lines, and planes’ 
with ‘tables, chairs, and beer mugs’. 


Here is a small example. Suppose that we are doing geometry with just the 
following three of Euclid’s axioms: 


(1) Any two points lie on a unique line. 

(2) If the point P does not lie on the line LZ, then there is exactly one line L’ 
passing through P and parallel to LD. 

(3) There exist three non-collinear points. 
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We understand that ‘collinear’ means ‘lying on a common line’, and that two lines 
are ‘parallel’ if no point lies on both. Notice that if two lines are not parallel then 
they have exactly one common point (for more than one common point would 
violate Axiom (1)). 

According to Hilbert’s dictum, it would be equally valid to begin 


(1) Any two tables lie on a unique chair. 
(2) .acee 
From these axioms, we can prove the following theorem: 


Theorem 1.3 Two lines parallel to the same line are parallel to one another. 


Proof Let L’ and L” be two lines both parallel to LZ. Arguing by contradic- 
tion, suppose that L’ and L” are not parallel. Then they have a point P in 
common. But now there are two lines L’ and L” containing P and parallel to L, 
contradicting Axiom (3). 


This is ‘obviously’ true in the ordinary Euclidean plane, but we have proved 
it in any geometry satisfying the axioms. Here is a less obvious example: 
Points: A, B,C, D, E, F,G,H,I 
Lines: ABC, DEF,GHI, ADG, BEH, CFI, AEI, BFG,CDH, AFH, 
BDI,CEG. 


It is some labour to verify the axioms, but once this is done then the conclusion of 
the theorem must hold. Indeed, the lines DEF and GH are both parallel to ABC, 
and they are parallel to one another. Here we seem to be a long way from traditional 
geometry, and it does not seem so stupid to say that A, B,C,... are tables and 
ABC, DEF,... are chairs, and that any two tables lie on a unique chair! 
An even simpler example is the following: 

Points: A, B,C, D 

Lines: AB, CD, AC, BD, AD, BC. 
In this case, there is only one line parallel to a given one, so the theorem holds 
‘vacuously’: we cannot choose two lines L’ and L” parallel to L. This is a bit 
puzzling at first: what is going on here? 

A statement of the form ‘If P, then Q’ is true, according to the rules of logic, 
if P is false. We discuss this further on page 60. If P can never be true, we 
sometimes say that the statement is ‘vacuously’ true. 

Non-Euclidean geometry was discovered in the nineteenth century. By the 
early twentieth century, the ‘axiomatic method’ had become the paradigm for 
mathematics. 


Exercise 1.1 Prove from Axioms (1)—(3) the following assertions: 


(a) Any line passes through at least two points. 
(b) Any two lines pass through the same number of points. 
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Exercise 1.2 Give an example of a system of points and lines satisfying Axioms (1) 
and (3) but not (2) (a ‘non-Euclidean geometry’). 


Exercise 1.3 Let n be a natural number. Show that n? is even if and only if n is 
even. (We say that n is even if n = 2m for some natural number m, and is odd if 
n = 2m-+1 for some natural number n. The exercise asks you to show two things: if 
n is even then n? is even; and if n? is even, then n is even. In this question you are 
permitted to use the fact that every natural number is either even or odd: the proof 
of this obvious-looking assertion is the subject of Exercise 1.12 later on.) 


Exercise 1.4 Let the prime numbers, in order of magnitude, be 1, p2,.... Prove that 
Pnt1 S pipas++Pn +1. 


Exercise 1.5 (a) Prove that, for any prime number p, ,/p is irrational. 
(b) Prove that the cube root of 2 is irrational. 


Exercise 1.6 Fill in the details in the following argument. 


Proposition 1.4 [fn is a positive integer which is not a perfect square, then 
Jn is irrational. 


Proof Suppose that \/n = a/b, where a/b is a fraction in its lowest terms. 
Then a/b = nb/a, so the fractional parts of these two numbers are equal, say 
d/b = c/a, where 0 <c<aand0 <d< b. Then a/b = c/d, contradicting the 
assumption that a/b is in its lowest terms. 


(This argument is taken from The Book of Numbers, by J. H. Conway and 
R. K. Guy.) 


Exercise 1.7 Can you prove that, if 2” — 1 is prime, then n is prime? (We will see the 
proof later in this chapter.) 


Exercise 1.8 (a) Write down the converse of the statement 


If n is an even integer greater than 2, then n is the sum of two prime numbers. 
(b) Is the converse true or false? Why? 


Remark The statement given in (a) is a famous conjecture due to Goldbach. 
It is believed to be true, but this is not yet known. 


Exercise 1.9 Is the following argument valid? If not, why not? 


We are going to prove that a triangle whose sides have lengths 3, 4, and 5 is right-angled. 
By Pythagoras’ Theorem, if a triangle with sides a,b,c is right-angled, with 
hypotenuse c, then a? + b? = c?. 
Now 37+ 4? =9+16 = 25=57. 
So the triangle is right-angled. 
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Numbers 


Algebraic formulae often have symbols in them: x, a, and so on. In elementary 
algebra we think of these as numbers. But the domain we consider has an effect 
on whether the equations have solutions or not. 


1.4 The number systems. We consider briefly the different kinds of num- 
ber systems used in elementary algebra. You should be familiar with most of 
these. In Chapter 6, we will go into more detail on exactly how the different 
kinds of number are constructed. 


The natural numbers The natural numbers are the ones we use to count: 
1, 2, 3, and so on. They are sometimes called counting numbers. Actually, there 
is no agreement among mathematicians about whether 0 should be included 
as a natural number or not. Historically, the positive numbers arose (for use 
in counting) before the dawn of history, whereas zero is a much more recent 
and problematic invention. It is also more difficult for children to grasp. Brian 
Butterworth, an expert on the development of number sense in childhood, says, 
in his book The Mathematical Brain: 


Although the idea that we have no bananas is unlikely to be a 
new one, or one that is hard to grasp, the idea that no bananas, 
no sheep, no children, no prospects are really all the same, in that 
they have the same numerosity, is a very abstract one. 


Logically, however, it makes sense to count zero as the smallest natural number, 
as we will see. 

Fortunately, it does not very much matter what view we take about this. 

The set of natural numbers is denoted by N. 

The important property of natural numbers to an algebraist is that they can 
be added and multiplied. If one heap contains m beans and another has n beans, 
then together the two heaps contain m+n beans. Moreover, if we arrange some 
beans in a rectangular array with m rows and n columns, then mn beans are 
required. 

These operations satisfy some simple laws, sometimes called the laws of 
arithmetic: 


em+n=n+mand mn =nm (the commutative laws); 
e m+ (n+p) =(m+n)+>p and m(np) = (mn)p (the associative laws); 
e (nm+n)p = mp+np (the distributive law). 


In addition, adding zero, or multiplying by one, leaves any natural number 
unchanged. 

The bean-counting interpretation allows us to picture these laws; some people 
find that the pictures provide convincing explanations. For example, Figure 1.1 
shows the distributive law. 

The reverse operations are not always possible. Subtraction, defined by 
requiring that m —n is a number z such that n+ x = m, is only possible if 
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Fig. 1.1 (5+3)-4=5-44+3-4 


m is at least as large as n (in symbols, m > n). Division, defined by requiring 
that m/n is a number y such that ny = m, is only possible if m is a multiple of n 
(in symbols, n | m). Warning: Be sure to distinguish betweem m/n (a number), 
and n|m (a statement, which is either true or false). If m does not divide m, we 
write n}'m. 

We already saw Euclid’s proof that there are infinitely many prime numbers. 
Of course there are infinitely many composite numbers too: for example, every 
even number greater than 2 is composite. (A number n > 1 is composite if it 
is not prime.) 

The natural numbers have a very important property, sometimes called the 
induction property. 


Theorem 1.5 Let S be any set of natural numbers. Suppose that 


(a) 0 belongs to S; 
(b) for any natural number n, if n belongs to S, then n+1 belongs to S. 


Then S =N, that is, S is the set of all natural numbers. 


This theorem is true because the natural numbers are the ‘counting numbers’; 
that is, given any natural number n, it is possible (at least in principle) to start 
at zero and count up to n: ‘zero, one, two, three, ..., n’. Now the first number 
in the chain is in S; and as soon as we know that a number is in S$ then the next 
number is in S too. After n steps we find that n is in S. 

Sometimes this is called the ‘domino property’. Imagine we have an infinite 
number of dominoes standing in a line, labelled 0,1,2,.... The dominoes are 
arranged in such a way that, if number n falls, it will knock over number n+ 1. 
Then, if we knock over domino number 0, we can be sure that all the dominoes 
will fall. This is exactly what the induction property says, with S as the set of 
labels of dominoes that fall over. See Figure 1.2. 

Even if m is not a multiple of n, all is not lost. At school we learn the division 
algorithm: 


Theorem 1.6 (Division algorithm for natural numbers) Let m and n be 
any natural numbers with n > 0. Then there exist natural numbers q and r such 
that 


(a) m=nqt+r; 
(bj) r<n. 
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Fig. 1.2 Which dominoes will fall? 


Moreover, q and r are unique; that is, if also m = nq +1’, where r’ <n, then 
q=q7 andr=r'. 


The numbers g and r are called the quotient and remainder when m is 
divided by n. (The numbers m and n are sometimes called the dividend and 
the divisor.) 


Proof First we show the uniqueness. Suppose that m = nq+r = nq’ +r’ with 
r<nandr’ <n. Ifr=r’, then ng = nq’, so q = q’. Suppose that r #7’. Then 
one of them is larger; say r > r’. Then 


r—r=n(q' —4); 


so the same natural number is both less than n and a multiple of n, which is not 
possible. 

Now we show the existence. Consider the multiples of n: n, 2n, 3n,.... Even- 
tually we reach one which is greater than m (for certainly (m+ 1)n > m). Let 
q be the last integer x for which xn < m; that is, ng <_m but n(q+1) > m. (It 
may be that g = 0.) Put r = m-—vng. Then r > 0 but r < n; and m = ng+r. 


The integers As we have seen, subtraction is not always possible for natural 
numbers. To get round this, we enlarge the number system to include negative 
numbers as well as positive numbers and zero, giving the set 


/ come ener homes ya Fes vere 


of integers. Thus, we can add, subtract, and multiply integers. The laws we saw 
for natural numbers extend to the integers. 


We enlarge the number system because we are trying to solve equations 
which cannot be solved in the original system. At every stage in the process, 
people first thought that the new numbers were just aids to calculating, and 
not ‘proper’ numbers. The names given to them reflect this: negative numbers, 
improper fractions, irrational numbers, imaginary numbers! Only later were they 
fully accepted. You may like to read the book Imagining Numbers by Barry 
Mazur, about the long process of accepting imaginary numbers. 

The natural numbers 1,2,... are positive, while —1,—2,... are negative. 
Integers satisfy the law of signs: the product of a positive and a negative number 
is negative, while the product of two negative numbers is positive. 
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The rational numbers _ Similarly, division is not always possible for integers. 
To get round this, we enlarge the number system to the set Q of rational 
numbers, of the form m/n where n 4 0. By cancellation, we may assume that 
n > 0 and that the fraction is in its ‘lowest terms’, that is, m and n have no 
common factor. For example, 20/(—12) is the same as —5/3. 

We can write rules for adding and multiplying rational numbers: 


ac _ad+bec a _c_ ad—be 
b d bd’? b dd bd’ 
a c ac a/e ad, 
5 ae [So = H itezo. 


The last rule says: to divide by a fraction, turn it upside down and multiply. 
Thus, we can add, subtract, multiply, and divide rationals (except for division 
by zero). The usual laws extend to the rational numbers. 


The real numbers There are still many equations we cannot solve with ratio- 
nal numbers. One such equation is 7? = 2. (We saw Pythagoras’ proof of this 
in Theorem 1.2.) Other equations involve functions from trigonometry (such as 
sinz = 1, which has the irrational solution x = 7/2) and calculus (such as 
log x = 1, which has the irrational solution x = e). 

So, we take a larger number system in which these equations can be solved, 
the real numbers. A real number is a number that can be represented as an 
infinite decimal. This includes all the rational numbers and many more, including 
the solutions of the three equations above; for example, 


= 0.4 


2 
5 
2 = 0.142857142857..., 


1 
V2 = 1.41421356237..., 
= — 157079632679 ..., 
e = 2.71828182846... 


In the last three cases, we cannot write out the number exactly as a decimal, 
but the approximation gets better as the number of digits increases. 

The arithmetic operations (excluding division by zero) extend from Q to R, 
and the laws of arithmetic continue to hold. 

The completeness of R (the fact that there are no gaps) is shown by various 
results from analysis such as the Intermediate Value Theorem: a continuous 
function cannot go from negative to positive values without passing through zero. 


The complex numbers Although there are no gaps in the real numbers, there 
are still some equations which cannot be solved. For example, the square of any 
real number is positive, so there is no real number z satisfying the equation 


xv? = —1. 
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We enlarge the real numbers to the set C of complex numbers by adjoining a 
special number i satisfying this equation. Thus, complex numbers are expressions 
of the form x + yi, where x and y are real numbers. The rules for addition and 
multiplication are exactly what you would expect, except that i? is replaced by 
—1 whenever it appears. Thus, 


(1 + yri) + (2 + yoi) = (21 + 22) + (Yr + Yo), 
(x1 + yri)(2 + yaoi) = (122 — yry2) + (e1y2 + T2y1)i- 


The number i is sometimes called ‘imaginary’, since at first it seemed to 
mathematicians to be less ‘real’ than the real numbers. The term ‘complex’, 
on the other hand, is not meant to suggest that the complex numbers are more 
difficult to understand than the real numbers, but only that each complex number 
x -+ yi is made up of a kind of ‘compound’ of two real numbers x and y; we call 
xz and y the real and imaginary parts of x + yi. The complex number z — yi is 
called the complex conjugate of z, and is written Z. 

All the arithmetic operations (except, as usual, division by zero) are possible, 
and the laws of arithmetic hold. Here, unlike for the other forms of numbers, 
we do not have to take on trust that the laws hold; we can prove them for 
complex numbers (assuming their truth for real numbers). Here, for example, is 
the distributive law. Let 2) = x1 + yi, 22 = ©2 + yoi, and z3 = x3 + y3i. Now 


2 (2g + 23) = (41 + yri)((w2 + £3) + (yo + ys)i) 
= (41(x2 + 3) — yi (yo + ys)) + (@1(y2 + ys) + yi (2 + @3))i, 
and 


2122 + 2123 = ((@122 — yry2) + (tiy2 + £2y1)i) 
+ ((@1%3 — yiys) + (x1y3 + v3y1)i) 


= (2122 — yiyo + ©1%3 — yiys) + (T1iy2 + Tayi + Liys + T3yi)i, 
and a little bit of rearranging shows that the two expressions are the same. 
Example 


2-31  (2—3i)(4—-i) 
A+i 424-12 
5 —14i 
a cae 


which can be verified by multiplying the result by 4 +i. 


Moreover, quadratic, cubic, and higher-degree equations can always be solved 
in the complex numbers. (This is the Fundamental Theorem of Algebra, 
proved by Gauss.) 

No further enlargements of the number system are possible without sacrificing 
some properties. 
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The rules for addition and subtraction can be put like this 


To add or subtract complex numbers, we add or subtract their 
real parts and their imaginary parts. 


The rule for multiplication looks more complicated as we have written it out. 
There is another representation of complex numbers which makes it look simpler. 
Let z= «+ yi, and suppose that z 4 0. We define the modulus and argument 
of z by 


lz] = Va? + y?, 


arg(z) = 6 wherecos@ = x/|z| and sin@ = y/|z]. 
In other words, if |z| = r and arg(z) = 6, then 
z=r(cos@ +isin6@). 
For example, let z = 1 +1. Then the modulus of z is 


JaJ= VP +P = v2, 


and the argument @ satisfies cos 0 = 1/./2 and sin@ = 1/2, so that 0 = 7/4. 


Now the rules for multiplication and division are 


To multiply two complex numbers, multiply their moduli and add 
their arguments. To divide two complex numbers, divide their 
moduli and subtract their arguments. 


The complex plane, or Argand diagram The complex numbers can be 
represented geometrically, by points in the Euclidean plane (which is usually 
referred to as the Argand diagram or the complex plane for this purpose). 
The complex number z = x + yi is represented as the point with coordinates 
(x,y). Then |z| is the length of the line from the origin to the point z, and arg(z) 
is the angle between this line and the z-axis. See Figure 1.3. 

In terms of the complex plane, we can give a geometric description of addition 
and multiplication of complex numbers. The addition rule is the parallelogram 
rule (see Figure 1.4). 

Multiplication is a little bit more complicated. Let z be a complex number 
with modulus r and argument 6, so that z = r(cos@ +isin@). Then the way to 
multiply an arbitrary complex number by z is a combination of a stretch and 
a rotation: first we expand the plane so that the distance of each point from 
the origin is multiplied by r; then we rotate the plane through an angle @. See 
Figure 1.5, where we are multiplying by 1 +i = V2(cos(m/4) + isin(7/4)); the 
dots represent the stretching out by a factor of \/2, and the circular arc represents 
the rotation by 7/4. 

Now let us check the correctness of our rule for multiplying complex numbers. 
Remember that the rule is: to multiply two complex numbers, we multiply the 
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z=atbi 


lzl=r 
b=rsin@ 


0 a=rcos@ 


Fig. 1.3 The Argand diagram 


£14 4+ 22 


Fig. 1.4 Addition of complex numbers 


moduli and add the arguments. To see that this is correct, suppose that z,; and 
Zq are two complex numbers; let their moduli be r; and ro, and their arguments 
0, + Ao. Then 


AL = r1(cos ral + isin 01), 


22 = r2(cos A +isin 02). 
Then 


£122 = ryr2(cos ral + isin 01) (cos A +isin 02) 
= 1112((cos 0; cos 62 — sin 6; sin 62) + (cos 4; sin 62 + sin 6; cos 62)i) 
=> ryre(cos(6, + 02) +isin(6, + 2)), 


which is what we wanted to show. 
From this we can prove De Moivre’s Theorem. 
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(3 + 2i)(1 +i) 
=1+4+5i1 


0 


Fig. 1.5 Multiplication of complex numbers 


Theorem 1.7 For any natural number n, we have 
(cos @ + isin 8)” = cos né + isin né. 
Proof The proof is by induction. Starting the induction is easy since (cos 6 + 
isin @)° = 1 and cos0 + isin0O = 1. 
For the inductive step, suppose that the result is true for n, that is, 
(cos @ + isin 8)” = cos né + isin né. 
Then 


(cos @ + isin@)"*! = (cos 6 + isin 0)” - (cos 6 + isin 6) 


= (cos nd + isin n@)(cos @ + isin 0) 


= cos(n + 1)6 +isin(n + 1)0, 


which is the result for n + 1. So the proof by induction is complete. 

Note that, in the second line of the chain of equations, we have used the 
inductive hypothesis, and in the third line, we have used the rule for multiplying 
complex numbers. 


The argument is clear if we express it geometrically. To multiply by the 
complex number (cos @ + isin #)", we rotate n times through an angle 6, which 
is the same as rotating through an angle né. 
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De Moivre’s Theorem is useful in deriving trigonometrical formulae. For 
example, 


cos 30 + isin 30 = (cos@ + isin 6)? 


= (cos? 0 — 3cos @ sin? @) + (3 cos? # sin 6 — sin® )i, 
so 


cos 30 = cos? @ — 3cos @ sin? 8, 


sin 30 = 3cos’ 6sin@ — sin? 6. 


These can be converted into the more familiar forms cos 36 = 4cos? 6 — 3cos@ 
and sin 36 = 3sin@ — 4sin® 6 by using the equation cos? @ + sin? 6 = 1. 

In Analysis, the definition of the exponential function is extended from the 
real numbers to the complex numbers so that 


e? = cos6 + isin dé. 


If we do this, then the modulus-argument form of a complex number is z = re’’, 
and we have 


Pa ‘ 92 = ell +02) 


De Moivre’s Theorem becomes 


i@\n __ ind 
(e")" =e”. 


1.5 Induction. The induction property of the natural numbers—which says 
that if you start at the beginning and step through them one at a time, then you 
eventually reach any number—is an important proof technique. 

We summarise the Principle of Induction formally in a theorem as follows. 
(In the domino example of Figure 1.2, P(n) is the proposition ‘Domino number n 
will fall’.) 


Theorem 1.8 (Principle of Induction) Let P(n) be a statement about the 
natural number n. Suppose that 


(a) P(O) is true; 
(b) For any natural number n, if P(n) is true, then P(n +1) is true. 


Then P(n) is true for every natural number n. 


Proof Let S be the set of all those natural numbers n for which P(n) is true. 
Then the hypotheses of the theorem tell us that 0 € S, and that, if n € S, then 
n+1€ S. So the induction property shows us that S is the set of all natural 
numbers. 
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There are several variations on this principle. Perhaps, in place of knowing 
P(0), we know P(1). Then we can conclude that P(n) holds for alln > 1. A 
similar statement would hold with 100, or any fixed number, replacing 1. 

It is important to note that, in a proof by induction, we have two jobs: to 
prove P(0) (called starting the induction) and to prove that the implication 
from P(n) to P(n + 1) holds (called the inductive step). However, there is 
another version, called the Principle of Strong Induction, which appears to 
get by without starting the induction. 


Theorem 1.9 (Principle of Strong Induction) Let P(n) be a statement 
about the natural number n. Suppose that, for any natural number n, if P(m) 
is true for allm <n, then P(n) is true. Then P(n) is true for every natural 
number n. 


Proof This time let S be the set of natural numbers n having the property 
that P(m) holds for all m < n. Now: 


(a) 0 € S; for there are no natural numbers m < 0, so P(m) vacuously holds 
for all of them! 

(b) If n € S, then P(m) holds for all m <n. By hypothesis, P(n) holds. Now 
any number m < n+ 1 either satisfies m <n orm =n, and P(m) holds in 
either case. Son+1¢€S. 


By the Induction Property, S contains all natural numbers; so, given n, we have 
n+1€S, so P(n) is true. 


It is time to have an example of the use of this principle. Suppose that you 
are asked to find the sum of the first n squares, that is, find 


1? 2? Bese, 


It is a daunting task without help. But suppose you are told, or guess, that the 
answer is n(n + 1)(2n + 1)/6. Then you can prove your guess by induction. Let 
P(n) be the statement 


1742?+4...4+n? =n(n+1)(2n+41)/6. 


Now P(1) is true: for, when n = 1, the left-hand side is 1? = 1, and the right-hand 
side is 1-2-3/6=1. 
Suppose that P(n) is true; that is, 
1742? 4.---+n? =n(n4+1)(2n+1)/6. 
Then 


1742? +---+n? + (n+ 1)? =n(n4+1)(2n+1)/6+(n+1)? 
n+1)(2n? +n+ 6n+6)/6 


=( 
= (n+1)(n+2)(2n +3)/6. 
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But the right-hand side is what we get from our expression n(n + 1)(2n + 1)/6 
by substituting n+ 1 for n. So we have verified P(n + 1). 

By the Principle of Induction, we have proved P(n) for all n > 1. 

Study this proof carefully. It seems at first that we are assuming what we 
are asked to prove. If we were, the argument would not be valid. You should 
convince yourself that this is not the case. 

Here is another example. Consider the sequence 


PIAL ZEA IA I VO 7D, 28 


We want to show that the terms of this sequence increase, but are never greater 
than 2. 

Let z,, be the nth term of the sequence. The relationship between consecutive 
terms is 


In4i = V2+ Ln. 


We prove by induction that rz, < @41 and x, < 2 for all n. 


Both of these statements are true for n = 0. (Why is V2 < 24+ v2?) 
Suppose that t, < @n+41. Then 


In41 = V2 + fn < /2 + Tnt+1 = In+2, 


since /2 + is a strictly increasing function of x for positive x. (This means 


that, if a < y, then /2+ a4 < /2+y.) 


Now suppose that z, < 2. Then 


Eny = V24+ an <V¥24+2=2, 


using the same fact again. 

So we have proved the inductive step, and both statements follow by 
induction. 

Incidentally, from real analysis we know that an increasing sequence which is 
bounded tends to a limit. What is the limit of this sequence? (If you cannot see 
the answer immediately, calculate a few terms of the sequence.) 

Here is an example of the use of strong induction. This is a result which was 
used in the proof of Euclid’s Theorem. 


Proposition 1.10 Any natural number greater than 1 has a prime factor. 


Proof We have to show that, ifn > 1, then n has a prime factor. We do this 
by strong induction. Let n be a natural number, and assume that, if m is any 
natural number satisfying 1 <m <n, then m has a prime factor. 


e Ifn <1 then the statement is vacuously true. 
e If n is prime, then it is a prime factor of itself, and the statement is true. 
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e Suppose that n is composite; then n = ab, where 1 < a,b < n. By the 
induction hypothesis, a has a prime factor p. Now p is a prime factor of n, 
and so again the statement is true. 


We have covered all cases, and so the proof is done. 


Another consequence of the induction property is the following fact about 
the natural numbers. 


Theorem 1.11 Let T be any non-empty subset of the natural numbers. Then 
T has a smallest element. 


Proof We show the contrapositive form of this statement: that is, if T is a 
subset of the natural numbers which has no smallest element, then T is empty. 

So suppose that T has no smallest element. Let S be the complement of T’, the 
set of all natural numbers not in T. Let n be a natural number, and suppose that 
every natural number m smaller than n belongs to S. Then n must belong to S also; 
for, ifn € T, then n would be the smallest element of T (since all smaller numbers 
are in S). By the Strong Induction principle, S = N, and so T is empty. 


This property is sometimes referred to as the well-ordering property. 


Exercise 1.10 Show that (x + yi)(2 — yi) = x” + y?. Hence show that, if 2 + yi 4 0, 
then we can divide by it: 


utvi  fue+vy\ , (ve—uy ‘ 
xt yi x + y? TN 2 ye) 
Exercise 1.11 Show that the square root of x + yi is 


(Vi + var) + (ECF), 


[Hint: Square it and see!] Can you be sure that both the real and imaginary parts are 
genuine real numbers (that is, they are square roots of non-negative real numbers)? 


Exercise 1.12 Prove by induction that every natural number is either even or odd. 
Prove also that no natural number can be both even and odd. 


Exercise 1.13 Prove the following statements by induction. 


(a) The sum of the first n positive integers is n(n + 1)/2. 
(b) The sum of the cubes of the first n positive integers is equal to the square of their 
sum. 


Exercise 1.14 When the mathematician Gauss was in primary school, his teacher 
asked the class to add up all the numbers from 1 to 100. Gauss saw that, if he took the 
sum 


S= 14 24+ 3+---+ 994100, 
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and wrote it down reversed, 


S=100+ 99+ 984+---+ 24 1, 


then each pair of numbers in the two sums adds up to 101. So 
2S = 100 x 101 = 10100, 


and S = 5050. Your job is to turn this into a general proof that the sum of the natural 
numbers from 1 to n is n(n + 1)/2. 


Exercise 1.15 Use induction to prove each of the following statements: 
(a) For alln > 1, 


em Geen 1 on 
1x3. 3% 5 (2n—-1)x (Q2n4+-1) 2n+1° 


(b) 4” > 16n? for all n > 4. 
(c) For all n > 2, 


Log: ai a wits se A 1 
22-1 


Ray PS Ae ne DD: 


Exercise 1.16 Let ai,a2,a3,... be numbers satisfying the rules that ai = 1 and 
Qn = 2an—1 for alln > 1. 


(a) Write down the first few numbers a,. 
(b) Guess a formula for an. 
(c) Prove your guess by induction. 


Exercise 1.17 (x) Euclid’s proof that there are infinitely many primes gives us a rule 
for finding a new prime, if we already have a finite number: 


Suppose that we have found n primes already, say pi, p2,...,Dn- 
Multiply them together and add one: let N be this number, so that N = pip2:-:pn+1. 


If N is prime, take it to be the next prime pn+1. Otherwise, take pn+i to be the 
smallest prime which divides N. 
Euclid gives us a guarantee that pn+1 is different from all the primes pi,..., pn. 


Take p; = 2. Use MAPLE or a calculator to find pe, ps,..., ps. 

Experiment with taking different primes for p:. Does the prime 2 always turn up in 
the list sooner or later? Does the prime 3 always turn up? What is the main difficulty 
in the calculation? 


Exercise 1.18 Prove, using the well-ordering property, that an infinite strictly decreas- 
ing sequence of positive integers (that is, a sequence a1, a2, a3,... satisfying dn > an+41 
for all n) cannot exist. 


Exercise 1.19 What is wrong with the following argument? 


Proposition 1.12 All horses have the same colour. 
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Proof Let P(n) be the proposition that, in a set of n horses, all the horses 

have the same colour. We start the induction with P(1), which is clearly true. 
Now suppose that P(n) is true. Let {Hi,...,Hn4i} be a set of n+1 horses. 

Then {Mj,...,H,} is a subset containing n horses; by P(n), they all have the 


same colour. Similarly, {H2,...,H,+1} is a set of n horses, so these also have the 
same colour. It follows that {A,...,Hn+1} all have the same colour; so P(n+1) 
is true. 


By the Principle of Induction, P(n) is true for all positive integers n. 


Elementary algebra 


Abu Ja’far Muhammad ibn Musa al-Khwarizmi (whose name gives us the word 
‘algorithm’) wrote an algebra textbook which included much of what is still 
regarded as elementary algebra today. The title of his book was Hisab al-jabr 
w’al-mugqabala. The word al-jabr means ‘restoring’, referring to the process of 
moving a negative quantity to the other side of an equation; the word al-muqabala 
means ‘comparing’, and refers to subtracting equal quantities from both sides 
of an equation. Both processes are familiar to anyone who has to solve an 
equation! The word al-jabr has, of course, been incorporated into our language as 
‘algebra’. 
In this section we briefly revise the techniques of elementary algebra. 


1.6 Formulae and equations. A formula, or expression, is some collec- 
tion of symbols like 


we sin(logy) 2) +2" + 196883. 


This formula contains a variable x, and the assumption is that if we assign 
a numerical value to x, then we can in principle evaluate the formula and 
obtain a number. (We may not be able to do that in practice; if, for example, 
x = 8, then the above formula cannot be evaluated because the universe 
is not large enough to write down the answer!) We allow a formula to 
contain more than one variable. Thus, x? + 2¥ is a formula with two variables 
x and y. 

In Algebra, for the most part, we use only formulae built up using 
the arithmetic operations (addition, subtraction, multiplication, and division) 
and sometimes others such as exponentiation and taking square roots. More 
complicated functions such as sines and logarithms lie in the domain of 
‘analysis’. 

An equation is a mathematical statement of the form 


PF, = Fo, 


where F| and Fy» are formulae. Now it may be that, no matter what values we 
substitute for the formulae F) and F5, the equation is true. In this case, the 
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equation is called an identity. An example of an identity is 
oa? 41> (a? +2 41)(2? —2 +1). 


If an equation is not an identity, then there still may be some values of the 
variables for which the equation is true when these values are substituted. The 
procedure of finding all such values is called solving the equation, and the values 
are the solutions. An equation may have no solution, or one, or more than one. 
For example, the equation 


gisar+2 


has two solutions: x = 2 and x = —1. 

In solving an equation, we can apply any operation to it provided that we do 
the same thing to both sides. For example, from the above equation, we could 
obtain x7 — x = 2 (by subtracting x from each side), or 2x7 = 2x + 4 (by 
multiplying each side by 2). However, we cannot obtain 2x? = 2x + 2, since we 
have failed to multiply everything on the right by 2. 

Originally, the purpose of algebra was to solve equations! 


1.7 Brackets. The formulae 2x + 5 and 2(a + 5) are different; when x = 2, 
the first evaluates to 9 and the second to 14. 

The difference between them depends on a convention universally adopted in 
mathematics: 


In evaluating a formula, we perform multiplications and divisions 
before additions and subtractions. 


This rule is called precedence of operators. 

Thus, in the first formula above, we multiply 2 and x and then add 5 to the 
result. If we wish instead to add x and 5 and then multiply 2 by the result, we 
have to change the precedence of the operators. So we supplement the precedence 
rule by another rule asserting that, if part of a formula is enclosed in brackets, 
then this part is evaluated first and then treated as a single quantity in the later 
evaluation. The second formula above thus does exactly what we want. 

Brackets can be nested, in which case they are evaluated from the inside out. 
For example, the formula 


x+2(y+ 3(z + 4)) 
says: ‘add z to 4, multiply the result by 3, add y to this, multiply the result by 
2, and finally add 2’. 


Remember the distributive law we met earlier, which states that 


a(b+ c) = ab+ ac. 
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Using this, if a formula contains brackets, we may replace it by a formula not 
containing brackets. This is called expanding the brackets. For example, the 
formula with nested brackets above can be changed into 


u+2Qy + 62 4+ 24. 


Brackets may contain arbitrarily complicated expressions. If you are expand- 
ing brackets, remember to multiply everything inside the brackets. So 2(3a” + 
dy + 5z) = 6 + 8y + 10z, for example. 

If several brackets have to be multiplied together, the work should be done 
in stages: 


(a+ b)(c+ d) = (a+ b)c+ (a+ b)d = ac+ be + ad + bd. 


Finally, note that mathematicians use several types of brackets, such as ( ), 
[ ], { }. In the past, these were all used in formulae; some mathematical pub- 
lishers even had rules about the order in which they were to be used in a nested 
expression! 

The rule now is only to use the ordinary ‘parentheses’ ( ) for this job, as 
the others have different meanings. We have seen that ‘braces’ { } are used for 
sets, while ‘square brackets’ [ ] are sometimes used to denote the integer part 
or ‘round-down’ of the expression, as in [9/2] = 4. (It is better to use the more 
specialised brackets | | for this, so that [9/2] = 4. Then we can use [| | for 
‘round-up’, as in [9/2] = 5.) 

Still other brackets such as ‘angle brackets’ ( ) are used with other specialised 
meanings. We can also think of modulus signs | | as a kind of brackets. 

Some mathematical expressions have implicit brackets. In the formulae 


a+b — 
ner mal f, 

the three additions must be performed before the division and taking the square 

root, even though there are no actual brackets in the formulae. 


+b 


above. This 
+d 


F ; ; a 
1.8 Fractions. Formulae may contain fractions, such as 


can also be written (to save space) as (a+ b)/(c+ d). 
The rules for manipulating fractions are 


a c_ ad+be. 
"bo 'd~6bd’ 

a ce ad—be. 
°“b d bd’ 

a c_ ae 
- bod. bd’ 

ajc _ ad 
"b/d. be’ 

ax 
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The addition and subtraction rules involve putting the fractions over a com- 


mon denominator. They are easily proved using the last rule (the cancellation 


aad c c : 
rule) in reverse. Thus i and — = —; now they have the same denominator 


bd’ d_ bd 
and can be added. 
Do not learn these rules. Rather, you should practice until you can manipulate 
fractions without thinking. Also, fractions can be cancelled at any time, not just 


1 3 
the end of the calculation. If you have to work out 5 40° it is better to write 
them with a denominator of 40 to get 


1D. BeBe BAL 


5 40 40 40 8 
than to apply the rules literally: 
1 3 1:40-3-5 — 25 1 


5 40 40 200 8° 


the bigger numbers in the second calculation make mistakes more likely. 


1.9 Square roots. The square root of a non-negative number z is the 
non-negative number y such that y? = x. Notice that, at least for real numbers, 
only non-negative numbers have square roots, and that the square root is itself 
non-negative. So, even though it is true that (—4)? = 16, yet the square root 
of 16 is 4 and not —4. 

There is no simple formula for adding square roots. The multiplication rule 
is 


Va: Vb = Vab. 


This means that 2./a = Vx2a if x is non-negative. 
Remember that square roots are implicitly bracketed. Thus when we multiply 
out, everything under the square root sign must be multiplied: 


aVatb=vVx2a+ xb. 


But do not make the mistake of thinking that Va +6 = a+ Vb; this is almost 
always wrong! 
Similar principles hold for cube (and other) roots. 


1.10 Powers. If n is a positive integer, then x” means the expression 


obtained by taking n factors x and multiplying them together: for example, 


et=ae-un- un. 


The rules for expressions with powers are: 


m+n Mm, »n 


ew =f 
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Mathematicians have extended this definition: if x is positive, it is possible 
to give a meaning to x” for any real number r, in such a way that the three laws 
just stated continue to hold. The important cases to remember are 


x =1, gi=-, all? = Jz. 
z 


In general, 2~" = 1/2". 


1.11 Polynomials. A polynomial in the variable x is a formula which is 
a sum of a number of terms each of the form az”, where a is a number and 
n a non-negative integer. (Remember that «9° = 1, so that ax® is just a.) In 
this section we take the word ‘number’ to mean ‘real number’. An example of a 
polynomial is 


27x? + 203x? — 31a +5. 


Often we use the function notation f(x) to denote a polynomial in the variable 
x. Then, if c is any number, f(c) denotes the evaluation of f(x) when x is given 
the value c. 

The expressions ax” making up a polynomial are its terms, and the degree 
of the term az” is the number n. A constant term is one whose degree is zero. 

We assume that the coefficient a of any term is non-zero (we omit any terms 
with zero coefficients), and that different terms have different degrees (as several 
terms with the same degree can be combined into a single term). The only 
problem here is that there is a polynomial with no terms at all, which we write 
as 0 (the alternative would be not to write anything, which may be confusing’). 

Polynomials are added and multiplied as formulae. This means that, if the 
same power of x occurs in two polynomials, then when we add them we can 
combine the corresponding terms. For example, 


(272° + 2032? — 31e +5) + (2? — 2002? + 312 +7) = 272° + 2° + 32? +12. 


A polynomial can also be thought of as a function, whose value for a given value 
of x is obtained by substituting the value of x and then evaluating the result. 

In fact, the question ‘what exactly is a polynomial?’ is much more difficult 
to answer than indicated, here. To mention just two problems: 


e If two polynomials are identical apart from the fact that the variables have 
been given different names, are they the same polynomial or not? 

e If two polynomials give rise to identical functions, are they the same 
polynomial or not? 


In the next chapter, we will see how mathematicians currently view these 
questions. 

Addition and multiplication of polynomials satisfy many of the same laws as 
the same operations for numbers: the commutative, associative, and distributive 
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laws all hold. These statements do not strictly require that the coefficients of 
the polynomials are real numbers: it is enough that the coefficients themselves 
should satisfy these three laws, so any of the standard number systems will 
do (See Exercise 1.25.) Also, adding the polynomial 0, or multiplying by the 
polynomial 1, has no effect. 

The degree of a polynomial is the largest degree of any of its terms. (Accord- 
ing to this definition, the zero polynomial 0 does not have a degree. Some people 
arbitrarily set its degree to be —1, or —oo, but my convention seems simpler.) A 
polynomial with degree 0, 1, 2, 3, 4, or 5 is called constant, linear, quadratic, 
cubic, quartic, or quintic, respectively. 

As for integers, there is a division algorithm for polynomials: if f(x) and 
g(x) are polynomials, then there exists a quotient q(x) and a remainder r(x) 
such that 


° f(a) = g(w)a(@) + r(x); 
e either r(x) = 0, or r(a) has degree smaller than the degree of g(z). 


The way of finding the quotient and remainder is very similar to the division 
algorithm for integers. If the remainder r(x) is the zero polynomial, we say that 
g(x) divides f(x). 

Here is an example: Divide x+ + 42° — x — 5 by 2? 4+ 24-1. 


x? +2r —3 
x?+2e—1 ) at + 403 —ax —5 
wt+203 — 2? 
Oe fa — a 


Qa? + 4x? — Qa 
=327 +2 -—5 
—3a? — 62 +3 

7x —8 


This calculation shows that when we divide x+ + 4x? —x+5 by 2? +22 —1, the 
quotient is 2? + 22 —3 and the remainder is 7x — 8. 
In one particular case, it is easy to calculate the remainder: 


Theorem 1.13 (Remainder Theorem) If f(x) is divided by x —c, the 
remainder is f(c). 


Proof Suppose that f(x) = (# — c)q(x) + r(x). Since r(x) has degree less 
than 1, it is a constant polynomial. Then substituting « = c we find that f(c) = 
r(c), so that r(a) is the constant polynomial f(c) (or the zero polynomial, if 


f(e) = 0). 


From this we immediately obtain Theorem 1.14. 


Theorem 1.14 (Factor Theorem) Let f(x) be a polynomial and c a number. 
Then x —c divides f(x) if and only if f(c) =0. 
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A non-constant polynomial is called irreducible if it cannot be written as the 
product of two polynomials of smaller degree. Any linear polynomial is obviously 
irreducible, since the only polynomials of smaller degree are constants. 

Over the real numbers, the polynomial «? + 1 is irreducible. For, if it is not 
irreducible, it must have a factor of degree 1, which we can take to be x — c for 
some real number c; then the Factor Theorem shows that c? + 1 = 0, which is 
impossible. 

This argument would fail if our numbers were complex numbers. Indeed, we 
would have 


zg? +1= (2+ i)(x—i). 


Irreducible polynomials play a similar role to prime numbers, with one 
main difference: any non-constant polynomial can be factorised into irreducible 
polynomials, but the factors are not unique. For example, 


6x” — 6 = (2x + 2)(3a — 3) = (32 + 3)(2x — 2). 


We will see in the next chapter how, in a much more general situation, we can 
prove a ‘unique factorisation theorem’ for polynomials. 

To conclude this section, I mention that it is possible to have polynomials 
in more than one variable. For example, a polynomial in x and y is a sum of 
terms of the form ax™y”, where a is a number and m and n are non-negative 
integers. 


1.12 Quadratic and cubic equations. A polynomial equation is an 
equation of the form f(x) = g(x), where f(x) and g(x) are polynomials. By 
subtracting g(x) from both sides, we can write this as h(a) = 0, where A(z) is 
the polynomial f(a) — g(x). In this form, we say that the equation is quadratic, 
cubic, ..., if h(a) is quadratic, cubic, .... 

Throughout the history of mathematics, one of the most important topics 
has been the problem of solving polynomial equations. 

Here is Al-Khwarizmi’s solution of the quadratic equation 2? + 10x = 39. In 
the quotation, the ‘root’ is x and the ‘square’ is 2”; according to the conventions 
of his day, Al-Khwarizmi did not consider the possibility of negative solutions. 
In modern terminology his solution is 


v= V5? +39-5=3. 


What is the square which combined with ten of its roots will give 
a sum total of 39? The manner of solving this type of equation is 
to take one-half of the roots just mentioned. Now the roots in the 
problem before us are 10. Therefore take 5, which multiplied by itself 
gives 25, an amount which you add to 39 giving 64. Having taken 
then the square root of this which is 8, subtract from it half the 
roots, 5 leaving 3. The number three therefore represents one root 


30 Introduction 


of this square, which itself, of course is 9. Nine therefore gives the 
square. 


Abu Ja’far Muhammad ibn Musa al-Khwarizmi (about 810) 


Notice how Al-Khwarizmi does not use symbols (which were a long way in 
the future in his time), but explains the method clearly with an example. In 
essence, he gives us an algorithm to solve any equation of the form x? + ax = b: 
halve the coefficient of x, square it, add the constant term, take the square root, 
subtract half the coefficient of x. 

The only differences between this and the modern formula is that now we do 
not mind negative numbers in our equations, so we write any quadratic as 


a? +ar+b=0, 


and similarly we allow both positive and negative solutions. Thus, our 6 is the 
negative of Al-Khwarizmi’s, and we have to subtract it instead of adding, giving 
the well-known formula 
at Va? — 4b 

2 


The key idea in obtaining this solution is completing the square. We 
recognise that 


a = +,/(a/2)? — b- a/2 = — 


a? + ax + (a/2)? = (2+ a/2)", 


and so by adding (a/2)? — b to both sides of the equation x? + ax +b = 0 we 
obtain 


(x + a/2)* = (a/2) —b, 


so that z+ a/2 = +\/(a/2)? — b, and subtracting a/2 from both sides we obtain 
the solution. 

Can we do anything similar for more complicated equations? Consider the 
cubic equation 


ze+an?+bat+c=0. 
We try completing the cube, using the fact that 
x? + ax? + (a?/3)x + a°/27 = (x + a/3)°, 
so that the given equation becomes 
(2 + a/3)? = (a? /3 — b)a + (a°/27 —c). 


Now it is not so simple, since there is still an x on the right. It took 
mathematicians hundreds of years to figure out what to do next! 
We continue this story later in the book. 
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Exercise 1.20 This exercise contains a few ‘drill’ questions on manipulation of formu- 
lae. If you cannot do them quickly and accurately, it is very important that you should 
practice similar examples until you can. These examples are taken from a course at 
Queen Mary, University of London, entitled ‘Essential Mathematics’; the name of the 
course is chosen for a good reason! 


gee ey eee eee) Ie Mme 
30. 7 OTL 3 NB 4° 9 Jo) } 
~ab? \? a \~? 
( cb3 ) (=) : 


(c) Compute the remainder when —2* + 3x? + 2a — 1 is divided by 2? + 2. 
(d) Add and simplify 


(a) Evaluate 


(b) Simplify 


i 3 
y2—2y—15 ° y2—10y +25" 


(e) Compute f (—1/a*), where 


(£) Simplify 
1 V30— V12V15 
Va (V3- V3? 


to the form m+ nv/d, where m, n, and d are integers. 
(g) Simplify 


Vx? —¢3 — V4 —-4e. 
(h) Solve the equation 


z’-9 6 
x+3 32-1 


Exercise 1.21 Show by means of an example that the distributive law fails for the 
‘round-down’ brackets | |; that is, it is not true that a|b+c| = ab-+ac. Indeed it is 
not even equal to |ab| + |ac}. 


Exercise 1.22 Show directly that 7” —c” is divisible by x —c for any natural number 
n. Deduce the Remainder Theorem from this. 


Exercise 1.23 Use the preceding exercise to show that « — 1 divides x* — 1 for any 
natural number k, and deduce that m— 1 divides m* — 1 for any natural number m. 
By taking m = 2', show that 2' — 1 divides 2*’ — 1. Deduce that, if 2” — 1 is prime, 
then n is prime. (Compare page 6, where we showed that the converse is false.) 
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Exercise 1.24 If you know the Intermediate Value Theorem, use it to show that (over 
the real numbers) there is no irreducible polynomial of degree 3. 


Exercise 1.25 Show that the proof of commutativity of multiplication of polynomials 
requires the commutativity and associativity of addition for the coefficients, as well as 
the commutativity of their multiplication. 


Exercise 1.26 Let f be a polynomial, and let f(t) be the result of substituting the 
real number ¢ for the indeterminate x. Are the following statements true or false? If 
false, what is the correct rule? 


Sets 


In this section, we take a quick look at sets, mainly to introduce the notation 
for unions, intersections, Cartesian products, and so forth, and to examine the 
concepts of relations and functions, especially equivalence relations. Although we 
cannot say what a set is without going round in circles, sets provide the accepted 
basis for mathematics. 


1.13 Introduction. It is very difficult to define a set. On the other hand, 
everybody knows what a set is, and the explanation ‘a set is a collection of 
objects’, though no good as a definition (What is a collection?), is quite clear. 
What is important is that we can tell, of any particular element, whether or not 
it belongs to the set in question. A set is completely determined when we know 
all of its members. 

Often, we use capital letters for sets, and lower case for their elements. We 
write x € X to denote that the element x is a member of (or belongs to) the set 
X, and « ¢ X for the negation of this. 

How do we specify a set? 

If it is finite, we can just list its elements inside braces, or curly brackets {}. 
Thus, {1,3,4,5,9} is a set with five elements. 

Certain familiar infinite sets have names. Thus, N, Z, Q, R, and C denote 
the sets of numbers, which we met earlier. Other infinite sets can be described 
by giving a test for membership in the set. For example, 


{x€Z:x=2y+1 for some y € Z} 
is the set of odd integers. 


1.14 Sets and set operations. Two sets are equal if they have the same 
members. For example, 


{2} ={x €Z:x>0,z is even, x is prime}. 
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So the basic test for equality is 
A=B means that (x € A) S (a € B) for all elements z. 


Notice that, to prove two sets equal, we have to prove two things: every element 
of A is in B; and every element of B is in A. (Remember that P <= Q means 
P => Qand O=>P.) 

The empty set is the set which has no members. It is written as 0, a zero 
with a slash; not the Greek letter ¢ (phi). 

The empty set is a notorious source of trouble in mathematics. Arguing about 
it requires cool-headed logic rather than intuition. If you have proved a general 
fact about sets, it is worth checking that it holds for the empty set. Here is an 
example of how to argue with the empty set: 

There is only one empty set. For suppose that EF, and E2 are empty sets. 
For any element x, the statements (a € £,) and (x € E) are both false, since 
FE, and Ey have no members. So, according to the rule for ‘if and only if’, the 
statement (x € E,) = (a © Ep) is true, and by the basic test for equality, we 
have Ey = E». Informally, E; and EF have the same members (viz., none at all). 

A is a subset of B if A consists of some (perhaps all) of the elements of B. 
This is written as A C B, like a curved ‘less than or equal’ sign. So the basic 
test for a subset is 


ACB means that (x € A) => (x € B) for all elements z. 


Note that this involves half the work of proving that A = B. Any set is a 
subset of itself. Also, the empty set is a subset of any set. (For consider the 
proposition (2 € 0) => (x € A). For any 2, the statement x € @) is false; and a 
false proposition implies any proposition, so the implication is true.) These two 
subsets, which always exist, are the trivial subsets. 

If A C B and A F B, we say that A is a proper subset of B, and write 
ACB. 


Now we turn to some ways of building new sets from old. 


Definition The union of two sets A and B is the set of all elements lying in 
either A or B: 


AUB={«x:x€Aorze Bh. 

The intersection of A and B is the set of all elements lying in both: 
ANB={«:xc€Aand ce B}. 

The difference A \ B consists of the elements which lie in A but not in B: 


A\B={a:x€Aand a ¢ B}. 
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A B A B A B A B 
AUB ANB A\B AAB 
Fig. 1.6 Venn diagrams for set operations 


The symmetric difference A A B consists of the elements which lie in one of 
A and B but not in both, so that we have 


AA B=(A\B)U(B\ A) =(AUB)\ (ANB). 


For example, if A = {2,3,5} and B = {1,2,4}, then AU B = {1,2,3,4,5}, 
AN B= {2}, A\ B= {3,5}, and AA B= {1,3,4,5}. 

These concepts are conveniently illustrated by Venn diagrams (see 
Figure 1.6). 


An ordered pair (a,b) has a first element a and a second element b. 
This means that two ordered pairs (a,b) and (c,d) are equal if and only if both 
a = cand b=d. So the ordered pair (a,b) is not the same as the set {a, b}: the 
elements of a set do not come in any particular order, so {a,b} is the same as 
{b, a}. (It does not matter exactly how ordered pairs are defined.) 


Definition Let A and B be sets. Their Cartesian product is the set 
Ax B= {(a,b):a€ A,beE Bh, 


the set of all ordered pairs with first element in A and second element in B. If 
A= B, we write A? for Ax A. 


The name ‘Cartesian’ commemorates Descartes, who unified algebra and geom- 
etry by the insight that the Euclidean plane (a geometric object) is essentially 
the same as the set R x R = R?: each point of the plane can be represented by 
its Cartesian coordinates, which are an ordered pair (x,y) of real numbers; and 
every pair of real numbers represents a point. 

More generally, if n is a positive integer, then an ordered n-tuple, writ- 
ten (a1,d2,...,@n), has a first, second, ..., nth element; and (a1,...,@n) = 
(b1,..-,6n) if and only if a; = bi, ..., Gn = bp. The Cartesian product of 
Aj, Ag,..., An is 


A, X Ag X+++ X An = {(@1, @2,..-,Gn) 1 a1 € Al, a2 € Ag,..., An € An}, 


and A” = Ax Ax.--:-x A (n factors). Thus, according to Descartes, R” is 
n-dimensional space. 
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The number of elements in a set A is called the cardinality of A, and written 
as |A|. Although we use the same notation as for the absolute value or modulus of 
a number, the meaning is quite different. For example, |—3] = 3, but |{—3}| = 1. 

There is a theory of cardinalities of infinite sets, developed by Cantor in the 
nineteenth century, but we do not need this yet. If the set A is finite, then its 
cardinality is a non-negative integer. Here are a couple of basic results about 
cardinalities of finite sets. 


Proposition 1.15 Let A and B be finite sets. Then 


(a) |AU Bl =|A| + |B] —|An BI; 
(b) |Ax Bi = |A]-|BI. 


Finally, note that the elements of a set may themselves be sets. For example, 
{1,{1,2}} is a set with two elements: the number 1 and the set {1,2}. This 
process can be continued as far as we like: there are sets like {{{{1}}}}; and 
this is not the same set as {{{1}}}. Somewhere in the Galaxy there is a race of 
set theorists to which this comment seems natural and obvious. But, for human 
beings, the concept of a set of sets seems to cause panic and confusion, especially 
when we have to perform operations on its members (which are sets) as if they 
were single objects. This is unavoidable in algebra, as you will see when we reach 
factor rings in the next chapter. You have been warned! 


1.15 Functions. Until fairly recently, mathematicians thought of functions 
as formulae: the logarithm function log, x, the sine function sin x, or more com- 
plicated compounds such as the one given on page 23. Later they introduced 
functions with ‘split’ definitions, such as 
fers 1 if x is rational; 
~ | 0. if & is irrational. 

Eventually it seemed impossible to make a definition of a formula general enough 
for everything that mathematicians wanted to consider; so a completely different 
approach was taken. 


Think of a function F’ as a black box, where we can feed any element of A 
into the box, and an element of B will emerge at the other end. 


The name of the function is F'; we put x into the black box and F(x) comes 
out. Be careful not to confuse F’, the name written on the black box, with F(z), 
which is what comes out when z is put in. Sometimes the language makes it hard 
to keep this straight. For example, there is a function which, when you put in a, 
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outputs x7. We tend to call this ‘the function x?’, but it is really ‘the squaring 
function’, or ‘the function 2 ++ x?’. (You see that we have a special symbol t> 
to denote what the black box does.) 

Black boxes are not really mathematical notation, so we reformulate this 
definition in more mathematical terms. We have to define what we mean by a 
function F’. Now there will be a set X of allowable inputs to the black box; X 
is called the domain of F’. Similarly, there will be a set Y which contains all 
the possible outputs; this is called the codomain of F. (We do not necessarily 
require that every value of Y can come out of the black box. For the squaring 
function, the domain and the codomain are both equal to R, even though none 
of the outputs can be negative.) 

The important thing is that every input x € X produces exactly one output 
y = F(a) € Y. The ordered pair (x,y) is a convenient way of saying that the 
input x produces the output y. Then we can take all the possible ordered pairs 
as a description of the function. Thus we come to the formal definition: 


Definition A function fF: A — B is a subset of A x B such that, for every 
element a € A, there is a unique element b € B for which (a,b) € F’. We write 
b = F(a) if (a,b) € F. (Note that the definition says nothing at all about the way 
in which the function values are actually calculated.) We call F(a) the image 
of a under F’. 


A function is also often called a mapping or map: if fF’: A — B, we say 
that F' maps elements of A to elements of B (or maps A to B). The sets A 
and B are called the domain and codomain of F’, respectively. The image or 
range of F' is the set of values that come out of the black box when all elements 
of the codomain are fed in: 


Im(F) = {F(a): ae A}. 


It is a subset of the codomain. 

A function F': A — B is said to be one-to-one or injective if a ~ b implies 
F(a) # F(b) (so that different points have different images under F’). It is onto 
or surjective if every point of B is the image of some point of A. It is bijective, 
or a one-to-one correspondence, if it is both injective and surjective. If F is 
a bijection, then the elements of A are ‘paired up’ with elements of B by F’, so 
that, if the sets are finite, they must have the same number of elements. 

If F is a bijective function from X to Y, then there is an inverse function 
G from Y to X which takes every element y € Y to the unique x € X for which 
F(a) = y. In other words, the black box for G is the black box for F in reverse: 


x = G(y) if and only if y = F(z). 


The inverse function G is also bijective. Thus a bijective function F' and its 
inverse G satisfy 


e G(F(a)) =< for all x € X; 
e F(G(y)) =y for all ye Y. 
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Notice that F' is the inverse function of G. 


Proposition 1.16 Jf|A| =m and |B| =n, then the number of functions from 
Ato Bisn™. 


Proof We can represent a function from A to B as a table with two columns, 
where we write all the elements of A in the first column, and the value of F(a) 
opposite the entry a. Now there are |B| = n choices for each entry in the second 
column of the table; since there are m entries, there are n™ possible tables. Each 
table specifies a unique function. 


Sometimes we represent a function F': A > B by a picture, where we show 
the two sets A and B, and draw an arrow from each element a of A to the 
element b = F(a) of B. For such a picture to show a function, each element of 
A must have exactly one arrow leaving it. Now 


e F is one-to-one (injective) if no point of B has two or more arrows entering it; 

e F is onto (surjective) if every point of B has at least one arrow entering it; 

e F is one-to-one and onto (bijective) if every point of B has exactly one arrow 
entering it; in this case, the arrows match up the points of A with the points 
of B. 


Here are some illustrations. The first is not a function because some elements of 
A have more than one arrow leaving them while some have none. 


A B A 
e bree B 
ra re 
° e 
Not a function Onto, not one-to-one 
A B 


SZ 


One-to-one, not onto One-to-one and onto 
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1.16 Binary operations. 
Definition A binary operation on a set A is a function F' from A x A to A. 


Often we write such an operation differently: either with infix notation, using a 
symbol such as o, where we put aob for F(a, b); or with juxtaposition, where we 
simply write ab for F(a,b). (Of course, addition in a number system is a binary 
operation, written in infix notation with the symbol +; and multiplication is 
usually written as juxtaposition.) Other symbols used for operations include +, 
xX, —, *. 

More generally, if n is a positive integer, an n-ary operation on A is a 
function from A” to A. We use the terms ‘unary’ and ‘binary’ for n = 1,2 
respectively. But for n 4 2, there is no analogue of infix notation or juxtaposition. 

We can think of a binary operation as a black box with two inputs and one 
output: we put in a and b, and ao b comes out. 


5 


Proposition 1.17 Jf |A| = n, then the number of binary operations on A 
2 
isn”. 


Proof An operation is a function from A x A to A. Since |A x A| = n? and 
|A| = n, the result follows from Proposition 1.16. 


Suppose that we have a binary operation on A, denoted by o in infix notation. 
Then we have the ‘closure condition’ 


(C) (Closure law): For all a,b € A, aobeE A. 


This is really trivial because the operation is a function to the set A. It seems 
less trivial if we ask the question: When does o give an operation on a subset 
X of A? This will occur if and only if X satisfies the closure law; that is, for all 
a,beé X,acobe X. We will ask this question very often! 

A binary operation on a finite set can be described by an operation table. 
Let A= {a1,...,@,}. Take an n x n table, and label the rows and columns with 
the elements a),...,@,. In the position in row 7 and column 7, put the element 
a; 0 aj. The whole table can be labelled with the name of the operation. 


Example Let A= {a,@,7}, and define an operation o by 


voy={ x ifafy, 


a ifx=y. 


The operation table is 
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In this case there is a simple rule describing the operation. But even when there 
is no obvious rule, the operation table describes an operation completely. 


Example In the above example, which subsets X of A have the property that 
o defines an operation on X? That is, which satisfy the closure law? 

The only such subsets of X are Q, {a}, {a, 3}, {a,y}, and A. For the empty 
set certainly satisfies closure. If X is non-empty and « € X, then ror=aeE X. 
There are four sets containing a, the other four listed above; they all satisfy 
closure. For example, for {a, y}, the operation table is 


ola 
ala 
ome 


Q LQ IR 


1.17 Relations. A binary relation can be thought of as a statement which, 
given any two elements of a set A, is either true or false for that pair. For example, 
the relation ‘less than’ on the integers is true for the pair (5,17), and false for 
the pair (—1, —2). We know the relation completely if we know the set of pairs 
for which it is true. So we make a formal definition: 


Definition A binary relation R on a set A is a subset of the Cartesian 
product A x A. 


A binary relation can be thought of as a black box with two inputs, for which 
the only possible outputs are ‘yes’ and ‘no’. If we put in a and b, the output is 
‘yes’ if the pair (a,b) satisfies the relation, and ‘no’ if it does not. 


Proposition 1.18 Jf|A| =n, then the number of binary relations on A is Qn” 


Proof Proposition 1.16, since a binary operation is a function from A x A to 
{yes, no}. 


Often we represent a binary relation by infix notation, in which we place a 
symbol between a and 6 to indicate that (a,b) € R. Typical symbols used are ~, 
=, as well as the more familiar =, <, <, and so on. So, for example, we might 
use a ~ b to denote (a,b) € R. 

For example, if A = {1,2,3}, then the relation ‘less than’ on A is the set 
{(1,2), (1,3), (2,3)}. 

More generally, for any positive integer n, an n-ary relation on A is a subset 
of A”. 


1.18 Equivalence relations. Let R be a binary relation on A. Here are 
three laws which R may or may not satisfy: 


(Eql) (Reflexive law): (a,a) € R for alla € A. 
(Eq2) (Symmetric law): If (a,b) € R then (b,a) € R. 
(Eq3) (Transitive law): If (a,b) € R and (b,c) € R then (a,c) € R. 
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For example, the relation ‘less than’ on the set {1,2,3} satisfies (Eq3)—the 
only possible choice of a,b,c with (a,b),(b,c) € Risa =1, b = 2, c = 3, and 
indeed (1,3) € R—but neither (Eq1) nor (Eq2). The relation ‘less than or equal’ 
on the set {1,2,3} satisfies (Eql) and (Eq3) but not (Eq2). 

A very important type of relation consists of equivalence relations: 


Definition An equivalence relation on A is a binary relation satisfying 
(Eq1), (Eq2), and (Eq3); that is, one which is reflexive, symmetric, and transitive. 


If R is an equivalence relation, we define the equivalence class of the element 
aé Ato be {bE A: (a,b) € R}, the set of all elements related to a by R. 


Example Let R be the relation ‘congruent mod 4’ on the set of integers. 
That is, 


R= {(a,b): a,b € Z,a—b = 4z for some x € Z}. 


Now Rf is 
e reflexive, since a—a=4-0; 
e symmetric, since if a — b = 4x then b — a = 4(—2); 
e transitive, since if a— b = 4x and b—c = 4y thena—c=4(x+y). 
So R is an equivalence relation. Among its equivalence classes, we find 


E(0) = {...,-8,—4,0,4,8,...}, 
BO) Hh oe 7 3/159 act 
BOS 6:9 06 10s al 
TAG) ee Sees es es oe a 


Note that these four classes cover all the integers without any overlap. Moreover, 
E(4) = E(0), and so on; no new classes are obtained. We will see that these are 
characteristic properties of an equivalence relation. 


Definition A partition of a set A is a set {Aj, Ao,...} of subsets of A such 
that 


(a) A; £9 for all 4; 
(b) each element of A lies in exactly one of the sets A;—in other words, 
(b1) A,UAQU:::=A, and 


So, in the above example, {£(0), E(1), E(2), E(3)} is a partition of Z. 
Theorem 1.19 (Equivalence Relation Theorem) (a) Let R be an equiva- 
lence relation on a set A. Then the set of equivalence classes is a partition of A. 


(b) Conversely, let {A1,A2,...} be a partition of A. Then there is an 
equivalence relation on A whose equivalence classes are Aj, A2,.... 
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In other words, equivalence relations and partitions are the same thing. 


Proof (a) Let R be an equivalence relation on A. We have to show that the 
equivalence classes form a partition (satisfying conditions (a) and (b) of the 
definition). So take a € A. Then (a,a) € R by (Eq1), so a € E(a) by definition 
of E(a). This has two consequences. First, F(a) is non-empty (since at least 
it contains a). Second, since a was arbitrary, every element of A is in some 
equivalence class. It remains to show that different classes don’t overlap. So 
suppose that the element a lies in two classes £(b) and E(c). Then (6, a), (c,a) € 
R. By (Eq2), (a,c) € R; then by (Eq3), (b,c) € R, and by (Eq2) again, (c, b) € R. 
We want to show that E(b) = E(c); by the test for equality of sets, we must show 
that every element of E(b) is in E(c) and vice versa. So take x € E(b). Then 
(b,x) € R. Since (c,b) € R, (Eq3) gives (c,xz) € R, so x € E(c). The reverse 
implication is similar. 

(b) Now let {Aj, Ag,...} be a partition of A. Define a relation R on A by 
the rule 


R= {(a,b): a,b € A; for some it}. 


This relation R is 


e reflexive: for, given a € A, some set A; contains a, and so a,a € A;, whence 
(a,a) € R; 

e symmetric, trivially; 

e transitive: for suppose that (a, b),(b,c) € R. Then a,b € A; for some i, and 
b,c € A; for some j. But only one set of the partition contains b, so A; = Aj. 
Then a,c € A;, and (a,c) € R. 


So R is an equivalence relation. 

Suppose that a € A; (there is exactly one such set A; for any a). We claim 
that E(a) = A;. To show this, first take any x € E(a). Then (a,x) € R, so 
a,x € Aj; for some j. Since only one set contains a, we must have A; = Aj, 
and « € A;. Conversely, if « € A;, then a,x € A;, and so (a,x) € R, whence 
x € E(a). The claim is proved. 

In fact, more is true. If we take an equivalence relation R, apply the con- 
struction of (a) to obtain a partition, and then apply the construction of (b), the 
equivalence relation that we obtain is just R. 


We will later meet the term ‘canonical form’. If R is an equivalence relation 
on aset A, a canonical form is just a choice of one element from an equivalence 
class of R. So we can say that every element of A is equivalent to a unique element 
‘in canonical form’. In the example before the Equivalence Relation Theorem, 
the elements {0,1,2,3} can be taken as canonical forms. The term also carries a 
suggestion that the canonical forms are in some way ‘simpler’ or ‘more natural’ 
than arbitrary elements of A. 


We now show how equivalence relations are important in the study of func- 
tions which are not necessarily one-to-one or onto. Let F': A — B be sucha 
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Fig. 1.7 A function 


function. We defined the image of F' written Im(F’), as the subset of B consisting 
of values of F: 


Im(F’) = {b€ B: b= F(a) for some a€ A}. 
Thus 
e F is onto if and only if Im(F) = B. 


The kernel of F is the relation R on A in which two elements of A are related 
if they map to the same element of B: 


KER(F) = {(a1, a2) > 1,02 € A, F(a1) = F(ag)}. 
Then it is easy to show that 


e KER(F) is an equivalence relation on A, which is equal to the relation of 
equality if and only if F' is one-to-one. 


By the Equivalence Relation Theorem 1.19, we can think of KER(F) as either an 
equivalence relation or a partition. Later, we will meet a different kind of kernel 
(which we will write as Ker(£’)) which is a subset (rather than a partition) of A; 
and we will stop to examine the relationship. Only very special functions have 
kernels in the second sense! 

In Figure 1.7, the ‘blocks’ on the left-hand side are the equivalence classes of 
KER(F’), and the ‘oval’ on the right is Im(F’). 


Let us look at an example. Let A and B both be the set {1,2,3,4,5}, and 
let F be the function given by the formula F(x) = x? — 6x + 10. (We do not 
insist that a function should be given by a formula; but if it is, then this is a 
convenient way to specify it.) Then, as a set of ordered pairs (a subset of A x B), 
we have 


P= {(1, 5), (2, 2), (3, 1); (4, 2), (5, 5)}- 
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Note that, as required, every member of A occurs as the first component in 
exactly one of these pairs. We have 


Im(F’) = {1,2,5}, 
and the equivalence classes of KER(F*’) are the three sets 


{1,5}, {2, 4}, {3}. 


1.19 Permutations and combinations. The term ‘permutation’ has two 
different meanings in mathematical language, neither of which is the same as its 
everyday meaning in the world of football pools. 

In the nineteenth century, a permutation of a finite set S = {a1,...,an} 
meant an arrangement of the elements of the set in order. For example, if n = 5 
and S = {1,2,3,4,5}, then (3,5,4,1,2) is a permutation. 

There is a ‘natural’ order on this particular set, namely (1,2,3,4,5). Now 
the permutation can be produced by a ‘substitution’, the function f from the 
set S to itself which takes 1 to 3, 2 to 5, 3 to 4, and so on. Such a substitution is 
one-to-one and onto (that is, bijective); and conversely, every bijective function 
from S to itself defines a permutation. So we can regard the function f and the 
n-tuple as different aspects of the same thing. 

So we make a definition: 


Definition A permutation of a set S to be a bijective function from S to S. 
If S = {1,2,...,n}, then the n-tuple (f(1), f(2),..., f(m)) is called the passive 
form of the permutation. 


We can write a permutation in so-called two-line notation by writing the 
elements of the domain S in the first line and their images in the second. In our 
example above, the two-line notation for f is 


12 3 4 5 
3.5 4 1 2)’ 
so that the function f maps each element in the first line to the element imme- 


diately below it. In this notation, it is not compulsory to write the first line in 
the usual order. We could, for example, write the same permutation as 


25 13 4 
e 23 4 i) ; 
If the first row is in its usual order, then the second row is the passive form of 
the permutation. 
How many permutations of {1,...,n} are there? We can answer this by 
counting the passive forms. The first element f(1) may be any one of {1,...,n}, 


so there are n choices. The second element cannot be equal to the first (since f 
is one-to-one), so there are n— 1 choices. Similarly there are n — 2 choices for the 
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third, and so on. Finally, the last element must be the only one not yet used. So 
the total number of permutations is 


n:-(n—1)-(n—2)---2-1, 


the product of all natural numbers from 1 to n inclusive. This number is called 
n factorial, and written as n!. 

For convenience in later formulae, we extend the definition of the factorial 
function by setting 0! = 1. 

The notion of permutation is sometimes extended. For k < n, we define a k- 


permutation of S = {1,...,} to be a k-tuple (a1,...,a%) whose members are 
distinct elements from S. (This could be regarded as the passive form of a one- 
to-one function from {1,...,k} to {1,...,}.) The number of k-permutations is, 


by a similar argument, 
n-(n—1)---(n—k+1) 


(where there are k terms in the product); this number is denoted by "Px. note 
that 


n 
P, = ———— 

E (n—ky! 
and in particular "P,, = n!. 

The essential property of permutations is that ‘order is important’; a 
k-permutation of S is an ordered selection of k distinct elements of S. If we 
do not care about the order in which the objects are selected, we obtain a 
k-combination of S. Thus, a k-combination is just a subset containing k 
elements. 

The number of k-combinations of an n-element set is denoted by "Cx, or, 


more commonly, by er Since each k-combination can be ordered in k! ways, 


(;) 7 = ~ kl = ky 


If k = 0 or k = n, there is only one k-combination (the empty set or the 
whole of S, respectively). Because of our convention that 0! = 1, the formula 
does give the right answer in this case. 


we have 


The numbers > are usually called binomial coefficients. This is because 


they occur as coefficients in the Binomial Theorem: 


Theorem 1.20 (Binomial Theorem) For any natural number n, 
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Proof We have 
(a+ y)” =(x+y)(x@+y)---(@+y) (n factors). 


When the brackets are all expanded, we obtain a sum of many terms, each of 
the form x*y"—* for some k: such a term is obtained by multiplying together 2s 
chosen from k of the brackets and ys from the remaining brackets. The number 
of ways we can choose k brackets to pick the xs from (and hence the coefficient 


f gky?-*) j Ny. 
of xy )s (7) 


1.20 More on permutations. There is another convenient way to repre- 
sent a permutation, namely cycle notation. Let f be a permutation of S = 
{1,...,2}. Choose any point of S (let us say ao), and follow what happens to 
it as we apply f repeatedly, Since S is finite, we must eventually come back to 
an element we have met before. Now this must be our starting point ag. For let 
us suppose that this procedure generates successively ag, @1,...,@-—1 which are 
all distinct, and then a, which is equal to some a, with s < r. If s > 0 then 
we have 


since f is one-to-one, we must have a,—1 = a@s—1, which contradicts our choice of 
a, as the first repeat. So a, = ag. Now the r-tuple (ao, a1, a2,...,@,-—1) is called 
a cycle of f. The notation tells us that f maps each point of the cycle except 
the last to the next one along, while the last point comes back to the first. 

If f(ao) = ao, then we have a cycle with just one element, namely (ao). 

If every element of S occurs in the cycle, we are finished. Otherwise, choose a 
point which has not yet been used, and repeat the procedure. A similar argument 
shows that the cycle we generate has no elements in common with the previous 
one. Continue like this until we are finished. Then we simply juxtapose the cycles 
to obtain the cycle notation for f. 

Here is an example. Let f be the function which maps 1+ 4, 2+ 7,3 3, 
4+ 8,5 1,65, 74 2, and 8+ 6. This is a permutation: in two-line 
notation it is 


Start with the first element, 1. Follow its successive images under f until it 
returns to its starting point: 


filRP4r8r 65H 1. 


This gives us a cycle (1, 4,8, 6,5). 
If this cycle contains all the elements of the set {1,...,n}, then stop. 
Otherwise, choose the smallest unused element (in this case 2, and repeat the 
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procedure: 
fi2Pe7H 2, 


so we have a cycle (2,7) disjoint from the first. 

We are still not finished, since we have not seen the element 3 yet. Now 
f : 3 + 3, so (3) is a cycle with a single element. Now we have the cycle 
decomposition: 


f = (1,4, 8,6, 5)(2, 7)(3). 


By convention, in the cycle notation, we usually omit any cycle containing 
just one element. Thus the permutation (1,4,2) of {1,...,5} fixes 3 and 5. If 
every point is fixed, this convention would tell us to write nothing: instead, in 
this case, we write (1). 

Associated with any permutation f is a number, either +1 or —1, called the 
sign of f. It is defined to be (—1)"~°), where c(f) is the number of cycles of 
f in cycle notation (including cycles with just one element). For example, the 
sign of (1,3, 4)(2,5) is (-1)°-? = —1, while the sign of (1, 4,2) is (-1)°-3 = +1 
(don’t forget the two cycles (3) and (5)). We denote the sign of the permutation 
f by sign(f). We sometimes use instead the parity of f, which is defined to be 
the parity (even or odd) of n — c(f). Thus even parity is the same as sign +1, 
and odd parity is the same as sign —1. 

Another parameter of a permutation f is its order, defined to be the smallest 
positive number m such that, if f is applied m times, then every element of 
S returns to its original position. Now if f has a cycle of length r, then the 
elements of this cycle return to their original positions whenever the number of 
applications of f is a multiple of r (and only then). So every point returns to its 
starting position if and only if the number of applications is a multiple of every 
cycle length. We conclude that the order of a permutation is the least common 
multiple of its cycle lengths. 

For example, the order of (1,3,4)(2,5) is 6. 

We will say more about permutations in Chapter 3. 


Exercise 1.27 Let A and B be sets. Prove that A = B if and only if both A C B and 
BCA hold. 


Exercise 1.28 (a) Prove Proposition 1.15. 
(b) Prove by induction that, if A is a finite set, then |A”| = |A|” for all positive 
integers n. 


Exercise 1.29 Let A be a set with m elements. Then each of the following two sets 
contains m” elements: 


e the set A”; 
e the set of functions from {1,2,...,n} to A. 
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Find a bijection between these two sets. 


Exercise 1.30 How many functions are there from {1, 2,3} to {1, 2, 3, 4}? How many of 
these are (a) one-to-one, or (b) onto? Repeat this question for functions from {1, 2,3, 4} 
to {1, 2, 3}. 


Exercise 1.31 Each of the following attempts fails to define a function on the specified 
set. Explain why it fails, and how you could make the definition into a function. 


(a) F: ROR, F(a) = 1/2. 
(b) F: C? + C?, F(a,b) = (c,d) if c and d are the roots of the quadratic 


xe+axt+b=0. 


Exercise 1.32 For each of the eight combinations of (Eql), (Eq2), and (Eq3), find a 
relation on the set {1,2,3} which satisfies precisely that combination of axioms. 
Can this be done on the set {1,2}? 


Exercise 1.33 Does the Equivalence Relation Theorem hold if A is the empty set? 
[How many equivalence relations are there on §? How many partitions of the empty set 
are there?] 


Exercise 1.34 (a) Show that A x A is an equivalence relation on A. 
(b) Show that {(a,a) : a € A} is an equivalence relation on A. (This is the relation 
of equality.) 


Exercise 1.35 Show that there are exactly five equivalence relations on a set of three 
points. How many are there on a set of four points? 


Exercise 1.36 In each of the following cases, state whether the relation ~ on the set 
X is (i) reflexive, (ii) symmetric, or (iii) transitive: 


(a) X is the set of positive integers, x ~ y if x divides y. 

(b) X is the set of countries of Europe, x ~ y if x and y have a common border. 

(c) X is the set of capital cities of Europe, x ~ y if it is possible to travel from x to 
y by train. 

(d) X is the set of integers, x ~ y ifa < y. 

(e) X is the set of integers, x ~ y if x — y is divisible by 4. 


In those cases where the relation is an equivalence, describe its equivalence classes. 


Exercise 1.37 Let F : A — B bea function. Show that F induces a bijection between 
the set of equivalence classes of the kernel KER(F’) and the set Im(F). 


Exercise 1.38 Consider the following argument: 
False proposition Ifa relation is symmetric and transitive then it is reflexive. 


Proof Let R be a symmetric and transitive relation. Take (z,y) € R. Then 
(y,x) € R (since R is symmetric), and so (x,x) € R (since R is transitive; put 
z = in the transitive law). So R is reflexive. 


(a) Say what is wrong with this argument. 
(b) Give a counterexample to the false proposition. 
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Exercise 1.39 (x) Let X be a set, and let ~ be a relation on X which is reflexive and 
transitive. Write x = y to mean that both x ~ y and y ~ z hold. 


(a) Prove that = is an equivalence relation. 
(b) Suppose that x ~ y, and suppose that x1 and yi belong to the equivalence classes 
of x and y respectively. Prove that x1 ~ y1. 


Exercise 1.40 (*) Show that ordered pairs can be defined by the rule 


(a,b) = {{a}, {a, b}}. 


[You are asked to show that {{a}, {a,b}} = {{c}, {c, d}} if and only if a = cand b=d. 
Be sure to cover all cases in your argument. 


Exercise 1.41 Recall that, formally, a function is a set of ordered pairs. How many 
functions are there from the empty set to the empty set? How many of them are 
one-to-one? How many are permutations? 


Exercise 1.42 Write down the orders of all the permutations of the set {1,...,5}. 
(You should not attempt to write down all the permutations and find the order of each 
one!) 


Exercise 1.43 Show that a permutation which has odd order must be an even 
permutation. 
Is the converse true? 


Modular Arithmetic 


In this section we define the arithmetic of ‘integers mod m’ for any positive 
integer m. First, we look at Euclid’s Algorithm. 


1.21 Euclid’s Algorithm. 


Definition The greatest common divisor, or g.c.d., of two positive inte- 
gers m and n is the largest positive integer which divides both. We write it as 
gcd(m,n). 


Thus, gcd(12, 18) = 6. 

We can extend the notion of greatest common divisor to the case where 
one of the integers is equal to 0. Since any positive integer divides 0, we see 
that gcd(m,0) = m if m 4 0. If both m and n are zero, then gcd(0,0) is 
undefined (according to our definition above), so we adopt the convention that 
gcd(0,0) = 0. 

Euclid gave a rule for finding the greatest common divisor of two natural 
numbers, based on the division algorithm. 


Theorem 1.21 Let m and n be natural numbers. Then 


ged(mn) = { 


m ifn=0, 
gcd(n,r) if m=nqtrwithO<r<n. 
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Proof The first statement is true by definition. For the second, suppose that 
m=nqt+r. Then also r= m— ng. So any integer which divides m and n also 
divides n and r; and vice versa; so the greatest common divisor of m and n is 
equal to the greatest common divisor of n and r. 


This is all very well, but the second line simply replaces the calculation of one 
g.c.d. by another; why does this help? Notice that r <n. This means that, if we 
apply Euclid’s Theorem repeatedly, the second number of the pair of numbers 
whose g.c.d. we are finding gets smaller at each step. This cannot go on indef- 
initely; sooner or later, it becomes zero, and the first line applies. So Euclid’s 
Theorem gives us a constructive method to calculate the g.c.d. of two natural 
numbers. We refer to it as Euclid’s Algorithm. (An ‘algorithm’ is just a con- 
structive method, like a recipe or a set of directions, for achieving some result.) 
Here it is more formally 


To find gcd(m, n): 

Put a9 =m and a; = n. 

As long as the last number a; found is non-zero, put az41 equal 
to the remainder when ax_ is divided by ax. 

When the last number ax is zero, then the g.c.d. is ag_}. 


An example should make it clear. 


Example Find gcd(198, 78). 
ao = 198, ay = 78. 
198 = 2- 78+ 42, so ag = 42. 
78 = 1-42 + 36, so a3 = 36. 
42 =1-36+6, so a4 = 6. 
36 = 6-6+0, so a5 = 0. 

So gcd(198, 78) = 6. 


Euclid’s Algorithm actually does more than this. It expresses the greatest 
common divisor of m and n in terms of the original numbers. 


Theorem 1.22 For any two natural numbers m and n, there exist integers x 
and y such that gcd(m,n) = am + yn. 


I will not give a proof here: we will see this in much greater generality in 
the next chapter. But here is an example to show how it works. Refer to the 
preceding example showing that gcd(198, 78) = 6. 


Example 
6 = 42 — 36 
= 42 — (78 — 42) = 2-42 — 78 
= 2(198 — 2-78) — 78 = 2-198 —5- 78, 


sox=2,y=-—5. 
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1.22 The integers mod m. 


Definition Let m be a positive integer. We define a relation on the set Z, 
called ‘congruence modulo m’ and written =,,, by the rule 


L=Emyeam|(y—2). 


Often, instead of a =,, b, we write « = y (mod m). The meaning is exactly the 
same. 


Proposition 1.23 The relation =,, is an equivalence relation on Z with 
m equivalence classes. The numbers 0,1,...,m—1 are representatives of the 
equivalence classes. 


The proof is left as an exercise. 

Now we denote the set of equivalence classes by Zm; this is a set with m 
elements. We denote the equivalence class containing the integer x by [2]m. So 
we can write 


james] ee | ee ee 


Sometimes we will be lazy and just write x instead of [a]m. 
Now we can do arithmetic with the elements of Z,: we add and multiply 
them by the rules 


[t]m + [ylm = [c+ y|m, [z]m° [ylm = [zy]m- 


There is a problem with these definitions. Since [2],, means ‘the equivalence 
class containing x’, you would be within your rights to use different represent- 
atives for the two equivalence classes. Then adding and multiplying them will give 
different representatives for the classes [%]m + [y]m and [t]m-[y]m- Is it possible 
that we could actually arrive at different classes? If so, our definitions would be 
no good! In fact, this is not possible. Here is the argument for multiplication; try 
addition for yourself. 

Suppose that [z]n = [2’]m and [y]m = [y’]m. Then, by definition, x =,, 2’ 
and y =m y’, so 2’ = x«+um and y/ = y+ vm for some integers u and v. But 
then 


yf 


vy =(a@+um)(y+um) = cyt (cy + vx + muv)m, 


$0 LY =m vy’, and [ry]m = [x’y’]m, as required. 
Here, for example, are the addition and multiplication tables of Z4. We write 
x instead of [a]4 in these tables, and we use the representatives 0,1, 2,3 for the 
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equivalence classes. 


+/0 1 2 3 i [O> Sh 2S 
OO I 273 070 0 0 0 
Nl aes a 1!) O: a) 23 
yaa by aes aes ae D2 Ds. 
3°). 01, 2 Biers ae ET 


Subtraction is always possible in Z,, but division is not. We say that y is 
the inverse of x mod m if [2] m[y]m = [I]m- (If [z]m has an inverse, then we can 
divide by it simply by multiplying by its inverse.) 


Theorem 1.24 In Z,, the element [a]m has an inverse if and only if 
gcd(x,m) = 1. 


Proof If [x], has an inverse [y]m, then ry =m 1, so ry = 1+ um. Let d = 

gcd(x,m). Then d| a and d|m, so d| cy — um = 1; thus we must have d= 1. 
Conversely, suppose that gcd(x,m) = 1. Now Euclid’s Algorithm gives us 

numbers y and v such that zy+um = 1; hence ry =» 1, or [2]m[y]m = [1m- 


Now calculations can be done in Z,, as if the elements were ordinary numbers 
(but remembering that « and y represent the same element if they are congruent 
mod m). 


2 
Example Find 3 + : in Zy3. 
First method Find the inverses of 3 and 5 mod 13: 


3-9 =13 1, so 1/3 = 9; 
5-8 =13 1,80 1/5=8. 


(How did I find these? Either by trial and error, or by the method based on 
Euclid’s Algorithm explained in the last subsection.) 
Hence 2/3 + 3/5 =2-94+3-8 = 42 = 3, or [3],3 to be more accurate. 
Second method 
2 3 2-54+3-3 19 6 


375 2-3 15 2 


Exercise 1.44 Find all solutions of the equation x” = 2 (a) in Zi7; (b) in Zig. 


Exercise 1.45 (a) Let p be a prime number. Prove that the binomial coefficient (:) 


is a multiple of p for 1 <k<p-—1. 

(b) Use the Binomial Theorem and induction on n to show that, if p is a prime 
number, then n? =, n for all natural numbers n. [Hint: Expand (n+1)? by the Binomial 
Theorem; all terms except the first and the last are divisible by p.] 
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Remark This result is known as Fermat’s Little Theorem. 


Exercise 1.46 (*) In this question we look at the converse of Fermat’s Little 
Theorem. 


(a) Show that 2°41 =31, 2, but 341 is not a prime. (The number 341 is called a 
pseudoprime to base 2.) 


(b) Show that a°°! =s61 a for any integer a, but 561 is not a prime. (The number 561 
is called a Carmichael number, after Robert D. Carmichael, who first studied 
numbers with this property.) Hint: Show that 

e if pis prime and m= k(p—1)+1, then a” =, a; 
e if m is a product of distinct prime numbers then a =,,, 6 holds if and 


only if a =p, 6 for each prime number dividing m. 
(c) Can you find any more Carmichael numbers? 


Matrices 


Another familiar system of objects which make up part of the subject-matter of 
Algebra consists of matrices. 

Anyone who has used a spreadsheet program knows the importance of 2- 
dimensional tables of numbers. A matrix is just such a table. 


Definition <A matrix of size m xn, or an m xn matrix, is an array of numbers 
with m rows and n columns. 


We denote the entry in row i and column j of the matrix A by (A),;, or often 
in lower-case form by a;;. So 


Q11 aji2 Qin 
A- a21 422 a2n 
aml am2 ss) Amn 


For small matrices, we may want to avoid writing lots of subscripts by writing, 
for example, a 2 x 2 matrix A as 


a b 
‘Aes (° 7 , 

1.23 Matrices and determinants. A matrix is simply a table of numbers. 
However, if the matrix is square, there is a single number which can be calcu- 
lated from it (whose theoretical significance we will see later on). This is the 
determinant of the matrix. 

Two notations are commonly used. We write the determinant of the matrix 
A as det(A) or det A. Alternatively, if the matrix is written out as a table (in 
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round brackets), we denote the determinant by the same table surrounded by 


vertical bars; thus 
a b a b 
oo ( » ~ |e d . 


How is it defined? The rule can be written down easily in small cases: 


a b 
det ( = ad — be, 
c d 


a be 
det |d e f | =aei+bfg+cdh — ceg — bdi —afh. 
g h 2 


The process can be described as follows. We form all the terms which can be 
made by choosing one element from each row and column of the matrix. Some 
of these are given minus signs, and the results added up. Which terms get which 
signs? 

For 2 x 2 matrices, the term on the north-west to south-east diagonal is given 
a + sign and the term on the north-east to south-west diagonal has a — sign. 
The same rule holds in the 3 x 3 case if we imagine the matrix as a tile which is 
repeated. One negative term is highlighted. 


abcea b 
de fde 
g high 
a bea b 
defde 


This rule does not extend to larger matrices, however. The correct rule uses 
the notion of the sign of a permutation, which we defined in the preceding section. 
Notice that any term obtained by choosing one element from each row and one 
from each column can be specified by a function f: we choose the element from 
row & and column f(k), for k =1,...,n. This function must be a permutation, 
and so has a sign, which we denote by sign(f). Now this sign is affixed to the 
term in the expansion. 

Thus, the general formula is: if A = (a;;) is an n x n matrix, then 


det(A) = S- sign(f a1 f(1)2 (2) °°" @n f(n)s 
FESn 


where S;, is the set of all permutations of {1,...,n}. So the number of terms in 
the sum is n!. 
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In the case n = 3, the term bdi is selected by the permutation mapping 1 to 
2, 2 to 1, and 3 to 3, that is, (1, 2)(3) in cycle notation; its sign is (—1)?~? = —1, 
as indeed we saw. 


1.24 Addition and multiplication. As mentioned earlier, matrices can be 
added and multiplied. Here are the rules for 2 x 2 matrices. Let A and B be 2x 2 


matrices: say 
a b 
tele a) 


We define addition and multiplication by the rules 


at+ad b+0' 
soo (20 3) 


ted d+d' 


—_ faa’+bc ab! + bd’ 
os @ tdé cb’ + a) : 


The rule for addition is straightforward: just add the entries in corresponding 
positions in the two matrices. The rule for multiplication seems to have little 
rhyme or reason to it. You may well have met this rule in geometry and seen 
how it arises from geometric transformations. If not, you will have to wait until 
Chapter 4 for an algebraic representation. But you should not try to memorise 
the rule; it is easy to explain how it works. To find a particular entry in AB, for 
example the entry in the first row and second column, we look at the first row 
of A (containing entries a and b) and the second column of B (with entries 6’ 
and d’). Now multiply each entry in the chosen row of A by the corresponding 
entry in the chosen column of B (giving ab’ and bd’), and add these to obtain 
the entry of AB in the required position. 

For example: 


L 
Se Bop Ree 22 
3 4) \7 8} | 
1-64+2-8=22. 


Now it is possible to show that most of the now-familiar properties hold for 
matrices: addition is commutative and associative, multiplication is associative, 
and the distributive law holds. But there is a surprise: multiplication is not 
commutative. Take the matrices in the above example: let 


20). 2269) 
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Now check that 
19 22 23 34 
oer ci ae a= 6 ae 


There is nothing special about these two matrices; they happened to be the first 
pair that I thought up. It is quite rare for two matrices to commute! 


Now we could do as we did for polynomials, and examine the proofs of those 
laws that do hold to see what properties of the matrix entries are being used. We 
find that these are just the same properties as we are proving for the matrices 
(commutativity and associativity for addition, associativity for multiplication, 
distributive laws). In this way, we prove general theorems for matrices with 
entries from a whole range of number systems. 


For completeness, we define addition and multiplication for matrices of arb- 
itrary size. Let A and B be matrices; let A have (i, 7) entry a,; (this is shorthand 
for the entry in the ith row and jth column), and let B have (i,j) entry };;. 

The (i, 7) entry of A+ B is just the sum a;; + b;; of the corresponding entries 
of A and B. In order for this to make sense, we need that for each entry of A 
there is an entry in the same position in B and vice versa; that is, A and B must 
have the same size. Hence 


A+ B is defined if and only if A and B are both m x n matrices, 
for some m and n; then A+ B is also m x n. 


The (i,7) entry of AB is worked out as we described above. We select the 
ith row of A and the jth column of B; then we multiply each element aj, of the 
first by the corresponding element b;; of the second, and add all these terms, 
obtaining the formula 


S Aikdk; 
k 


for the (i, 7) entry of AB. For this to work, the number of elements in each row 
of A (which is the number of columns of A) must be equal to the number of 
elements in each column of B (which is the number of rows of B). So we have 


AB is defined if and only if A is m x n and B is n x p for some 
m,n,p; then AB is m x p. 


(And, in the above sum, / runs from 1 to n.) 

In particular, if A and B are both square matrices (this means that they have 
the same number of rows as columns) of the same size, then both the sum A+ B 
and the product AB are defined. We will see later that, in this case, 


det(AB) = det(A) - det(B). 


However, it is not true that det(A + B) = det(A) + det(B). 
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1.25 Linear equations. The theory of matrices grew from the problem of 
solving equations, specifically linear equations in several variables. Such a system, 
involving m equations in n variables, can be written in the form 


Q41%1 Tr QA12%9Q sie A1n%yn = 04, 
a21%1 + G22%2Q tips a2ntn = bo, 
Am1%1 TF Am2t2 pt Aamntn = bm- 


Such a system gives us a matrix of coefficients, the table of numbers aj;: 


Q11 a12 Gin 

a21 422 a2n 
A = 

aAm1 aAm2 ead Amn 


Such a system of equations may have no solution, or a unique solution, or 
more than one solution. Matrix theory allows us to determine which possibility 
occurs and gives us tools to calculate the solutions if they exist. Here, without 
proof, is part of the answer for the case where the numbers of equations and 
variables are equal. Suppose that the system of equations is 


Q11%1 1 412%2 wes Aintn = bi, 
a21%1 + a22%2 tie a2ntn = bo, 
An1®1 FT An2%2 TF +++ + Anntn = Dn. 


Let A be the matrix of coefficients, and let b be the 1 x n matrix (the column) 
with entries by, bo,...,b,. Let B; be the matrix obtained from A by replacing 
the zth column by the column b. 


Theorem 1.25 With the above notation, 
(a) If det(A) #0, then the equations have unique solution, given by 


i= za, PL, 6005 
wt “det (A) é . 
(b) If det(A) = 0, then either the equations have no solution, or they have more 
than one solution. 


This result perhaps explains the name of the mysterious ‘determinant’ 
function: it determines whether or not the equations have a unique solution. 
The formula for the solution in part (a) is Cramer’s Rule. 

We will see the proof of this theorem, and give details of how to calculate 
which possibility occurs, in Chapter 4. 
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For example, the equations 


3a + 2y = 22 
4x + 3y = 31 
have the unique solution 
22, 2 
— [8l 3) 66-62 © 4 
Hom seg Gisige -* a 
4 3 
3 22 
4 31 93 — 88 


a P ] rt gape 


Exercise 1.47 Prove the distributive law for 2 x 2 matrices. What properties of the 
matrix entries are used in your proof? 


Exercise 1.48 Look at the example of non-commutative matrices in the text. Observe 
that the sum of the elements on the ‘main diagonal’ is the same for AB as for BA (that 
is, 19 + 50 = 23 + 46). Is this a coincidence? 


Exercise 1.49 Annxn matrix A = (a;;) is upper triangular if a;; = 0 whenever i > 
j. (This means that all elements below the ‘main diagonal’ a11,@22,...,@nn are zero.) 
Prove that the sum and product of upper triangular matrices are upper triangular. 
Is multiplication of upper triangular matrices commutative? 


Exercise 1.50 Solve the following equations: 


a+ 2y+ 3z=10, 
22 sy + 10z = 26, 
3x 10y + 262 =55. 


Exercise 1.51 Solve the following equations, for any real number c: 


x+2y+ 3z=10, 
2x + dy + 10z = 26, 
3x 8y l7z= c¢. 


Exercise 1.52 Professor Fibonacci buys a pair of newborn rabbits at the beginning of 
month 0. Assume that rabbits are infertile until they are two months old, from which 
time each pair produces one pair of offspring every month for ever. Let x, be the 
number of pairs of baby rabbits at the start of month n, and yn the number of pairs 
of rabbits which are at least one month old. Show that 


Gein: 
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(im) =@ 4) ): 


How many pairs of rabbits does the Professor have after a year? 


and deduce that 


Appendix: Logic 

Reasoning and logic are to each other as health is to medicine, or— 
better—as conduct is to morality. Reasoning refers to a gamut of 
natural thought processes in the everyday world. Logic is how we 
ought to think if objective truth is our goal—and the everyday 
world is very little concerned with objective truth. Logic is the 
science of the justification of conclusions we have reached by nat- 
ural reasoning. 


Julian Jaynes (1976). 


1.26 Logic and truth tables. We are usually able to tell without too much 
trouble whether or not a simple mathematical argument is valid. For example, 
the argument 


All men are mortal; grass is mortal; therefore all men are grass 


is not logically valid, although it may express a poetic truth. In more complicated 
cases, it is good to know that the rules of logic have been codified and can be 
applied mechanically. 

We build up expressions and arguments from basic propositions or state- 
ments, each of which may be either true (T) or false (F). For the purpose of the 
argument, it does not matter what these propositions are; the logical validity of 
an argument should not depend on the meanings of its propositions. We denote 
propositions by the letters p,q,r... 

We are allowed to combine propositions with various connectives, as fol- 
lows. For each connective, we give its meaning in words and then in a truth 
table. All we are interested in is which combinations of truth values of the basic 
propositions make the compound proposition true. 


e Conjunction, ‘and’. The combination ‘p and q’ is true if both p and q are 
true, and is false in all other cases. We write ‘p and q’ as pA q (or sometimes 
p&q). The truth table is 


Wy A> 


WAH Hs 
WHT HS 
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e Disjunction, ‘or’. The combination ‘p or q’ is true if either p or q is true (or 
possibly both). This connective is sometimes called ‘inclusive or’, as distinct 
from ‘exclusive or’, which is true only when exactly one of p and q is true. 
We write ‘p or q’ as pV q. The truth table is 


P| qd\|pVvq 
T/T T 
TIF] T 
F/T] T 
F/F] F 


e Negation, ‘not’. The truth value of ‘not p’ is opposite to that of p: it is 
false when p is true, and vice versa. Writing ‘not p’ as p, the truth table is 


7p 
F 
T 


AHS 


e Implies, ‘if... then ...’. The truth table for ‘p implies q’, written p > q, 
is a bit surprising at first. The implication is true in all cases except when p 
is true and q is false. We return to this at the end of the section. The truth 
table is 


P| qd|p>q 
T/T T 
T/F F 
F/T T 
F/F T 


e Equivalent, ‘if and only if’. This connective, written p = q, is true if 
and only if p and q have the same truth value (both true or both false). The 
truth table is 


pla|peq 
Te 
T/F| F 
Bets) | ae 
ie |, = E 


Note that p< q means the same as p> q and q => p. 


Every valid rule of logic can be proved using truth tables. For example, con- 
sider the technique of ‘proof by contradiction’. In order to prove a proposition p, 
we assume the negation of p, that is, ~p, and deduce a contradiction x (whose 
truth value is F). In other words, we prove (=p) => x. It follows from the truth 
table for implication that, if x is false and (=p) > x is true, then —p is false; 
that is, p is true. 
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Again, consider the ‘proof’ that all men are grass. We are given p > r and 
q => r, and are supposed to deduce that p => q. But if it happens that p and r 
are true and q is false, then both p > r and q => r are true but p = q is false. 
So the deduction is not valid. 


The rule that if p is false then p => q is true, whatever proposition g may be, 
is often found perplexing. But it does agree with everyday usage. Suppose that 
I say to you, ‘If it is fine tomorrow, we will go to the Zoo.’ If it rains tomorrow 
then, whether or not we go to the Zoo, I did not lie to you! The only case in 
which I have lied would be if it is fine and we do not go to the Zoo. 

Bertrand Russell was asked about this by a philosopher, who said, ‘Is it true 
that, if 1+ 1 = 3, then you are the Pope?’ Russell improvised the following 
argument on the spot. 


Suppose that 1+ 1 = 3. Subtracting 1 from each side we obtain 
1 = 2. Now the Pope and I are two; therefore we are one. 


1.27 Sets and logic. An unexpected application of truth tables is to proving 
complicated identities about sets. Let A, B,C,... be sets, contained in some large 
set U (the ‘universe’), and let p be the proposition asserting that « € A, q the 
proposition x € B, r the proposition x € C,, and so on. Each of p,q,r,... may 
be true or false, depending on the point x. But, for example, p / q is true if and 
only if « € AN B, so pq represents the set AN B. Similarly, p V q represents 
AUB; 7p represents A’ = U \ A (the complement of A in the ‘universe’); and 
other connectives correspond to other combinations. Now suppose that we have 
to prove an identity, say 


AN (BUC) =(ANB)U(ANC). 
This will follow if we can show that the two propositions pA (qVr) and (pA q) V 


(p Ar) are equivalent (that is, always have the same truth value). This can be 
done quite mechanically as follows: 


3 


r qVr pA(qVr) p (pA q) Vv (pAr) 


iron) 
yon! 


Ao yy aADH!] > 


Vv 
T 
T 
T 
F 
F 
F 
F 
F 


Po yaya] s 
oes Ml C3 A | 
MHmMHaHDH 
MHA AHHH 
Zoya yayayay| > 


T 
T 
T 
F 
F 
F 
F 
F 


The result follows because the entries in the columns labelled p A (q Vr) and 
(pAq@)V (pA r) are identical. 
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Truth tables thus formalise the arguments using Venn diagrams that we met 
earlier. 


Exercise 1.53 Prove that (p > q) V (q => p) is logically valid (that is, true for all 
combinations of truth values of p and q). 


Remark This shows clearly that logical implication does not involve any 
material connection between propositions! 


Exercise 1.54 Explain why the words ‘necessary’ and ‘sufficient’ are used as described 
on page 6. 


Exercise 1.55 If p and q are the propositions « € A and x € B respectively, show 
that « € AA B is the proposition 7(p = q). What set is represented by p => q? 


Exercise 1.56 Prove, using truth tables, that (A UB)’ = A’/N B’ for any two sets A 
and B. 


Miscellaneous exercises 


Exercise 1.57 Find the least positive integer N such that every integer n > N can 
be written in the form 4a + 7b for some choice of integers a > 0 and b > 0. Prove that 
your N has this property. 


Exercise 1.58 True or false? Give reasons. 


(a) The square of any integer is congruent to 0 or 1 mod 4. 
(b) For any natural number n > 3, (n? +n + 2)/2 is a prime number. 


Exercise 1.59 Let S be the set of all 2 x 2 real matrices of the form ( cos =) 


—sin@ cos 
for real numbers @. 


(a) Let A and B be two matrices in S. Prove that AB € S. 
(b) Is multiplication of matrices in S commutative? 


Exercise 1.60 Find all complex numbers z satisfying z® = 1. 


Exercise 1.61 State the Binomial Theorem, and use it to evaluate 


S- i) Caos 


r=0 


Exercise 1.62 (x) (a) Let n be a positive integer with the property that every positive 
integer m < n/2 is a divisor of n. Show that n is 1, 2, 3, 4, or 6. 

(x*) (b) Let n be a positive integer with the property that every positive integer 
m < y/n is a divisor of n. Show that n is a divisor of 24. 
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Exercise 1.63 Here is an extract from some lecture notes on Geometry. 


The scalar product is a way of multiplying two vectors to produce 
a scalar (a real number). 

Let u,v be non-zero vectors represented by AB, AC. We define 
the angle between u and v to be the angle @ (in radians), with 
0<0< 7, between AB and AC. 

The scalar product (or dot product) u-v is |u||v| cos 6, if u and 
v are non-zero vectors and @ is the angle between them. If u = 0 
or v = 0 then we define u- v = 0. 


(a) What is this? Is it a definition, a theorem, a proof, an example, or just some 
chit-chat? 

(b) Your boss asks you to summarise it in a brief bullet point, without using any 
symbols. What would you say? 

(c) Why are the zeros in the last sentence written in different typefaces? 


Exercise 1.64 Prove that, for any two positive integers m and n, 
gcd(m, n) -lem(m,n) = mn, 


where gcd(m,n) and lem(m,n) are the greatest common divisor and least common 
multiple of m and n. 
Does any similar result hold for three positive integers? 


Exercise 1.65 What is the largest possible order of a permutation of {1,2,...,8}? 
Give an example of such a permutation. Are there any permutations with this order 
which have odd parity? Are there any which have even parity? 


Exercise 1.66 Give an example of a real polynomial of degree 4 which is reducible 
even though it has no real roots. 
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Rings and subrings 


This chapter is about rings, which are abstract systems in which addition and 
multiplication are defined. The prototype is the ring Z of integers, with the 
ordinary arithmetic operations. This is the example to which we keep coming 
back: we are interested in seeing how far the familiar properties of integers (as 
outlined in Chapter 1) can be extended. 


2.1 Introduction. A ring is a set with two binary operations called addi- 
tion and multiplication. We use the same notation for these operations in a 
general ring as in the integers: addition is represented by an infix +, and multi- 
plication by either juxtaposition or an infix -. (That is, we denote the sum of a 
and b by a+ 6, and their product by either ab or a- b.) 

A ring is defined by a list of axioms, which follows. These are divided into 
three groups, involving addition, multiplication, and both operations, respec- 
tively. These are meant to be familiar properties of Z. I will assume without 
proof that they all hold in Z. 


Axioms for addition 

(AO) (Closure law): For all a,b € R,a+bdeE R. 

(Al) (Associative law): a+ (b+ c) = (a+b) +c for all a,b,c Ee R. 

(A2) (Zero law): There exists 0 € R such that a+0=0+a=<a foralla€c R. 

(A3) (Inverse law): For all a € R, there exists b€ Rwitha+b=b+a=0. 

(A4) (Commutative law): a+ b= 6+ <a for all a,be R. 

Axioms for multiplication 

(MO) (Closure law): For all a,b € R, abe R. 

(M1) (Associative law): a(bc) = (ab)c for all a,b,c € R. 

Mixed axiom 

(D) (Distributive laws): (a + b)c = ac + be and c(a+ b) = ca+ cb for all 
a,b,cE R. 


The two distributive laws are sometimes called the left and right distribu- 
tive laws respectively. Please do not try to remember which is which; I will not 
use these names. 

The closure laws (AO) and (MO) are not strictly necessary: when we say that 
addition and multiplication are operations on R, it follows that the closure laws 
must hold! We will see the reason for requiring them when we come to look at 
subrings in Section 2.4. 
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Just as, when buying a personal computer, you are offered various extra 
features (more RAM, larger hard disk, etc.), so it is possible to have extra features 
in your ring if you are prepared to spend a bit more. Some of these are as follows. 
A ring R is a ring with identity if it satisfies 


(M2) (Identity law): There exists 1 € R with 1 4 0 such that al = la = a for 
allaeé R. 


R is a division ring if it satisfies (M2) and also 


(M3) (Inverse law): For all a € R with a # 0, there exists b € R with 
ab = ba=1. 


Ris a commutative ring if it satisfies 
(M4) (Commutative law): ab = ba for alla e€ R. 


Finally, a commutative division ring is called a field. 

Note that the extra multiplicative axioms are almost exact parallels of the 
additive axioms that we require in any ring. The exception is that, in the inverse 
law, we only require that non-zero elements have multiplicative inverses. We will 
see the reason for this soon. 

Many authors make the convention that a ring must have an identity. In 
other words, they assume axiom (M2) along with (A0)-(A4), (M0), (M1), 
and (D). 

In fact, some go so far as to use the word rng (sic) for a structure which I 
called a ring (that is, satisfying (AO)-(A4), (MO), (M1), and (D), so that they 
can use the term rIng for ‘rng with Identity’. 

Of course it is just convention. When different groups of mathematicians use 
different conventions, you are free to choose the one you like best, but you must 
accept that other people will do things differently. When we have learned a bit 
more about rings, I will explain why I took the decision I did on page 79. 

The commutative law for addition follows from the other axioms for a ring 
with identity. The simple argument for this is outlined on page 69. A different 
proof is outlined in the solution to Exercise 2.7. 


Remark Remember that the qualifying expressions in the terms ‘commutative 
ring’ and ‘ring with identity’ refer to the multiplication. The addition in a ring 
is always commutative, and there is always an identity (or zero) element for 
addition. 


2.2 Examples of rings. 


Example 1 Our prototype of a ring is the ring Z of integers. It is indeed a ring; 
in fact, it is a commutative ring with identity (but not a field, since, for example, 
there is no integer x such that 2x = 1). I assume that all of these properties of 
integers are familiar to you. To give formal proofs, it is necessary to have a 
careful definition of the integers. This is done in courses on the Foundations of 
Mathematics: we will look at the arguments in Chapter 6. 
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Example 2 Other familiar number systems, such as Q (the rational numbers), 
R (the real numbers), and C (the complex numbers), are fields. 


Example 3: Matrix rings Let R be any ring. Let M,(R) denote the set of 
all n x n matrices with elements in R. We can define addition and multiplication 
on M,,(R) by rules which look exactly the same as those for matrices of real 
numbers, which we saw in the last chapter. That is, if A=(a;;) and B= (b,;) 
(this means that the element in row i and column j of A is a;;, etc.), then 


A+B =C= (ciz), where Cig = Aig + biz, 


k=1 


(Note that the rule for matrix addition depends on addition of ring elements, 
while the rule for matrix multiplication involves calculating n products and 
adding them up. There is a potential problem here, since we can only add two 
elements at a time: we will see in the next section that it does not matter how 
we perform the additions.) 


It can be shown that M,,(R) is a ring. (See Exercise 2.2 for the case n = 2.) 

If R has an identity, then so does M,(R) (the usual identity matrix with 1 
on the diagonal and 0 everywhere else). But M,,(2) is not commutative except 
in trivial cases, and is never a division ring for n> 1. 


Example 4: Polynomial rings For any ring R, the set R[2] of all polynomials 
with coefficients in R is a ring. This is a generalisation of the familiar case of 
real polynomials. We will discuss exactly what a polynomial is, and how addition 
and multiplication should be defined in general, in Section 2.9. 


Example 5: Finite rings A finite ring can be specified by giving operation 
tables for its addition and multiplication. For obvious reasons, these tables are 
usually called addition tables and multiplication tables. 

For example, it can be shown that the structure given by 


+|0 1 
0,0 1 0 
1 0 i 


1 
0 
1 


oO O}°O 


1 


is a field. This can be proved directly, but it takes some work. For example, 
to verify the associative law (Al) from the tables, we have to substitute all 
possible values of a,b, and c. There are two possibilities for each of these, so 
2° =8 instances of the law to be checked. Also, of course, eight instances of 
the associative law for multiplication, four of the commutative law, 16 of the 
distributive law .... For larger finite rings the situation is even worse. The moral 
is that, if all else fails, this method can be used; but usually it is better to have 
a more theoretical proof! 


66 Rings 


You probably recognised that the ring with the above tables is Ze, the integers 
mod 2. In fact, Z,, is a ring, for any positive integer m. But not all finite rings 
are of this kind. Here is one which is not. 


+|0 1 ab -|0 1 ab 
0};0 1 a b 0/0 0 0 0 
1/1 0 6b a 1/0 1 a b 
aja b0O 1 a|}/0O aa 0 
b}b al 0 b/O0 b 0 6b 


Example 6: Zero rings Let R bea set with one binary operation + satisfying 
axioms (A0)-(A4). (Later, we will see that such a thing is called an abelian 
group.) Is it possible to define a multiplication on R so that it becomes a ring? 
The answer is yes: it is always possible, by the trivial rule 


ab = 0 for all a,be R, 


where 0 is the zero element given by (A2). 

To prove this, we check the remaining axioms: 

(MO): For all a,b€ R, ab=0E R. 

(M1): For all a,b,c € R, (ab)c = 0 = abe). 

(D): For all a,b,c € R, we have (a+ b)c = 0, while ac+ bc = 0+0 = 0, using 
property (A2). Similarly the other way round. 

A ring constructed in this manner (one in which all products are zero) is 
called a zero ring. Such rings always exist, but they are not very exciting. 


Example 7 The set of all even integers is a ring. It is commutative, but does 
not have an identity. (There is no even integer e such that ex = x for all even 
integers x.) 


Example 8: Boolean rings Just as Descartes aimed to turn geometry into 
algebra by setting up coordinates in the Euclidean plane, so Boole attempted to 
turn set theory (and logic) into algebra, as we see below. The main legacy of his 
attempt is that his name is familiar to every computer scientist. 

Let X be a set, and let R denote the power set of X, the set of all subsets of 
X. (This is sometimes denoted by P(X).) We define operations on R as follows. 
For A, B C X, we let A+ B be the symmetric difference of A and B, the set 
of all elements lying in either A or B but not both. (This is sometimes written 
AA B.) Also, we let A- B be the intersection AN B. 

Now R is a ring. Let us check the axioms. 

(AO): Clear. 

(Al): Use a Venn diagram or truth table to show that (A + B) + C and 
A+(B+C) are both equal to the set of elements which are either in all three 
of the sets A, B,C or in exactly one of them. 

(A2): A+ = A, since nothing is in the empty set; so @ is the zero element. 

(A3): A+ A = @, since there is no element which lies in A but not in both A 
and A.(!) So the inverse of A is A. 
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(A4): Clear. 

(MO): Clear. 

(M1): (AB)C and A(BC) both consist of the elements lying in all three sets. 

(D): Prove this by means of a Venn diagram or truth table. 

Now R is a commutative ring. It has an identity, namely the whole set X 
(since X 1 A = A for any A C X). But it is not a division ring if X has more 
than one element: the equation AM B = X can never hold if A is a proper subset 
of X. 

A ring of this form is called a Boolean ring. For example, when X = {0,1}, 
the addition and multiplication tables are as follows: 


+{ 0 {0} {1} {0,1} [9 {0} (3 {0,1} 
0) 0 {0} {1} {0,1} 0)9 0 OD ) 
{0} | {0} 0 {0,1 {1} {0} | 9 {0} 8 {0} 
{1}) {1} {0,1} 0  — {0} {1}]}9 0 {1} {1} 
{0,1} | {0,1} {1} {0} 0 {0,1} | 0 {O} {1} {0,1} 


2.3. Properties of rings. In this section we prove a few basic properties 
which follow from the ring axioms. 


1. In a ring, we can only add elements two at a time. What if we want to add 
more than two elements? We have to put in brackets to convert the sum into 
a succession of pairwise additions. However, because of the associative law, the 
answer is the same no matter how we put in the brackets. For example, consider 
a+b+c+d. There are five possible ways of evaluating this, corresponding to the 
five bracketings ((a+b)+c)+d, (a+(b+c))+d, (a+b)+(c+d), a+((b+c)+d), 
and a+ (b+ (c+ d)). Now (a+ 6) +c=a+(b+0), so the first two are equal. 
Similarly, (6+-c)+d = b+(c+d), so the fourth and fifth are equal. Now consider 
(a+b)+(c+d). Putting a+b = a, this is r+(c+d) = (a+c)+d = ((a+b)+c)+d; 
similarly, putting c+d = y, it works out to a+(b+(c+d)). So all the expressions 
are equal. 

We usually write a+ b+ c+d, leaving out the brackets. 

In fact, the sum of any number of elements does not depend on the bracketing 
used to work it out. This might seem obvious to you as an extension of the above 
argument. But we can (and should) give a correct formal proof. 


— 


NS 


Proposition 2.1 Ina ring, the sum ay +---+ ay of any number of elements 
is independent of the bracketing used to work tt out. 


Proof The proof is by induction on n. For n = 1 and n = 2, there is nothing to 
prove. For n = 3, there are just two possible bracketings, namely (a; + a2) + ag 
and a, + (a2 + a3), and the associative law tells us that they are equal. So let 
us assume that the result holds for sums of fewer than n terms, and prove it for 
sums of n terms. The induction hypothesis allows us to write aj +---+@m for 
the sum of m terms whenever m <n. 
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Now consider two bracketings of the sum of aj,...,@,. In the evaluation of 
each bracketing, at the last-but-one stage, we will have two expressions, the sum 
of a1,...,@x, and the sum of ag41,...,@n, which are then added at the last stage. 
By the induction hypotheses, each of these smaller sums is independent of the 
bracketing, and so we can write the whole expression as 


(a1 +--+ + aK) + (Gag +++: + Gn), 


where 0 < k < n. Similarly, the other expression reduces to 


(ay +--+ + ay) + (@i41 +--+ +n), 


where 0 <1 <n. 
If k = 1, these expressions are clearly equal. So suppose not. We may assume 
that k < 1. Now, again using the inductive hypothesis, we can write the first as 


(ay +++ +a) + (aga to + an) + (igs t+ +4) 
and the second as 

(a1 +--+ an) + (aaa tot ay) + (anya +o + a. 
But these have the form x + (y+ z) and (a+ y) + z, where 


T= 4, +°*+ + 4k, 
Y= aki te + a, 


Z= Gyr +-++ + an; 


by the associative law, they are equal. 


The argument does not depend on the fact that the operation is called ‘addi- 
tion’, but only on the associative law. So the same is true, for example, for the 
operation of multiplication in a ring. 


2. Axiom (A2) guarantees that a zero element exists. Could there be more than 
one? Suppose that z; and zg are two zero elements in a ring R; that is, for all 
a€é R, 

€a4¢74=%7+a=a=a4+%2=24+4. 
Then we have 

24 = 21+ 22 = 2. 

So the zero element is unique. A very similar argument shows that, in a ring 
with identity, the identity element is unique. 


3. It is also true that inverses (as given by (A3)) are unique. For suppose that b 
and c are both inverses of a. This means that 


a+b=b+a=0=a+c=crta. 
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b+(a+c)=b4+0=68, 
(b+a)+c=0+c=c. 


By the associative law, the left-hand sides are equal; so b= c. 

We write the inverse of a as —a. Also, we abbreviate a+(—b) to a—b. Notice 
that we are falling into mathematical bad habits here: the symbol — is being 
used in the first place as a unary operator (taking a to —a), and in the second as 
a binary operator (combining a and b to form a — b). In fact, people are used to 
this double use and have no trouble with it, but many calculators have different 
buttons for the two different uses of —. 

Similarly, in a division ring, the multiplicative inverse of a non-zero element 
a is unique (and is written a~', so that aa~' = a~ta = 1). 


4. The cancellation law holds for addition: 
(C) (Cancellation Law): If a+c=b+c, then a= b. 


Proof Suppose that a+c=b-+c. Add —c to each side: 


(a+c)—c=(b+e)—¢ 
a+(ce-c)= c— Cc), 
a+0=6+40, 
a=b. 


5. Here is the proof that the commutative law for addition follows from the other 
axioms in a ring with identity. 
Expand (1+ 1)(a@+ b) in two ways. We get 


(1+ 1)(a+b) =(14+1a+(14+1)b 
=ata+b+b, 
and 
(1+ 1)(a+ 6) = 1(a+6)+1(a+b) 
=at+b+atb. 


Soa+a+b+b=a+b+a-+b. Cancelling a from the front and b from the end 
gives a+ b= b+4a, as required. 


6. For any a € R, a0 = 0a = 0. 


Proof 
a(0 +0) = a0+4+ a0. 
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a(0 + 0) = a0 = a0 +0. 


Now use (C) to cancel a0 from the equation to give a0 = 0. 
The proof that 0a = 0 is similar. 


Remark This is the reason why axiom (M3) for a division ring only requires 
non-zero elements to have multiplicative inverses; for 0 4 1, and x0 = 0 for all 
X € R, so there cannot exist b with b0 = 1. 


7. For any a € R, —(—a) =a. 


Proof —(—a) is the inverse of —a; that is, it is an element which, when added 
to —a, gives zero. But we know that a added to —a gives zero. Since inverses are 
unique, these two inverses of —a must be equal. 


8. For any a,b € R, —(a+ 6) = —b—a. 

Proof We have to show that —b—a is the inverse of a+ b. So add it toa+b: 
—b-—at+(a+6)=—b4+(-a+a)+b 

—b+0+5 

—b+b 

= 0, 


I 


l| 


as required. 


2.4 Subrings. Let R be a ring. A subring of R is a subset S C R which 
itself forms a ring (using the same operations as those in R). 

Let us see what checking the axioms involves in this case. 

(AO): We require closure, that is, for alla,b€ S,a+beES. 

(A1): The associative law automatically holds for all a, b,c € S', since it holds 
for all a,b,c in the larger set R. 

(A2): We require that the zero element of R lies in S. 

(A3): For each a € S, we require that —a € S. 

(A4): This holds automatically, by the same argument as for (A1). 

(MO): We require that S' is closed under multiplication. 

(M1): This is automatic, as for (Al). The same is true of (D). 


We conclude that, of the eight axioms, four are automatically true, just 
because we are looking at a subset of a ring. (These are the axioms assert- 
ing that all elements satisfy some equation.) So we only have to require the two 
closure axioms, the zero and inverse axioms. 

In fact, we can whittle these down to three: 


Theorem 2.2 (First Subring Test) A non-empty subset S of a ring R is a 
subring provided that, for alla,b€ S, we have a+b,ab,—a€ S. 
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Proof We are given the closure and inverse axioms; we have to show that 
0 € S. But S is non-empty, so take any element a € S. Then, by assumption, 
—a€éS, and so 0=a+ (—a) € S, as required. 


We can do even better, reducing the number of tests to two: closure under 
subtraction and multiplication. 


Theorem 2.3 (Second Subring Test) A non-empty subset S of a ring R is 
a subring provided that, for alla,b€ S, we havea—b,abe S. 


Proof Suppose that S' is closed under subtraction and multiplication. To show 
it is a subring, we verify the conditions of the First Subring Test. Take an element 
aé $. Thena-—a=0€S;so0-—a=-—a€S, and the inverse law holds. Now 
take a,b € S. Then —b € S, and so a — (—b) = a+b€ S, and we have closure 
under addition. Closure under multiplication is given; so S is a subring. 


Example We find all the subrings of the ring Z of integers. 

First, we show that, for any integer m, the set mZ = {ma : x € Z} of all 
multiples of m is a subring. Take a,b € mZ; let a = ma, b = my, for some 
integers x,y. Then 


a—b=m(a—y) € mZ, 
ab = m(may) € mZ, 


so mZ passes the Second Subring Test. 

Now we show that every subring of Z is of this form. So let S be a subring. 
Certainly 0 € S. If S = {0}, then S = 0Z is of the required form. So suppose 
not. If n € S, then also —n € S; so S must contain some positive integer. Let m 
be the smallest positive integer in S. We will prove that S = mZ. Proving this 
equality involves showing that each element of one set is in the other and vice 
versa. 

First, take any element of mZ, say ma. If « = 0, then mz = 0 € S. If 
x > 0, then mz = m+m+---+m (a terms), and m € S; so ma € S 
by closure. If « < 0, let = —y. Then my € S as above, and then 
maz =—my eS. 

Conversely, take any element of S, say n. By the Division Algorithm for 
integers, we can divide n by m, obtaining a quotient q and remainder r; thus, 
n=m@q+r,and0<r<m.Nowne S$ andmqe€S,sor=n—mge S. If 
r > 0, we have a contradiction to the fact that m is the smallest positive integer 
in S. So, necessarily, r= 0 and n = mq € mZ. 


Remark If you are asked to prove that something is a ring, it is usually much 
easier to recognise that it is a subset of a structure known to be a ring, and then 
apply one of the subring tests, than it is to check the eight ring axioms directly. 
Bear this in mind when you tackle Problems 2.1 and 2.3. 
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Exercise 2.1 Which of the following sets are rings (with the usual addition and 
multiplication): 


the natural numbers; 

the real polynomials of degree at most n; 

all polynomials with integer coefficients; 

all polynomials with integer coefficients and constant term zero; 
polynomials with integer coefficients and degree at most four; 
all real polynomials f such that f(2) = 0; 

all integers divisible by 3; 

all non-singular 2 x 2 real matrices; 

all complex numbers of the form a + bi for a,b € Z; 

all real functions of the form f(x) = ax + b for a,b ER. 


= eee 
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Exercise 2.2 Let R be aring, and let M2(R) denote the set of all 2 x 2 matrices with 
elements from R. Define addition and multiplication of 2 x 2 matrices by the usual 


rules: 
a b e f\_ fate b+f 
e djt g h}) \et+g d+h)’ 


(¢ ) ¢ ae a 
c d}) \g h}” \ce+dg cf+dh)’ 


Prove carefully that M2(R) is a ring. 


Exercise 2.3 Which of the following sets of 2 x 2 matrices over the real numbers 
are rings (with the addition and multiplication defined in Problem 2.2)? Which are 
commutative? Which have an identity? Which are division rings? 


(a) The set of all symmetric matrices (matrices A satisfying A’ =A). 
(b) The set of all skew-symmetric matrices (matrices A satisfying A' = — A). 


(c) The set of all upper-triangular matrices (matrices of the form (5 ) ). 


(d) The set of all strictly upper-triangular matrices (matrices of the form G 6) ). 


(e) The set of matrices of the form G a 


The transpose A' of a matrix A= a is given by A’ = ONE 
c d b d 


Exercise 2.4 Prove that, in any ring R, 


(a1 + a2 +--+ +4@m)(b1 + bo +--+ + bn) = arbi + abe +--+ + abn 
+aob) +--+: 
+ Gmbi +++: +ambn- 


Exercise 2.5 Let R be aring. For a positive integer n, let n- x denote x+---+ a (n 
terms). Prove that 
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(a) (m+n)-cx=m-a24+n-2 and (mn)-xc#=m-(n-2). 
(b) if 1 is an identity and n-1=0, then n- x =0 for all z. 


Exercise 2.6 Let x and y be elements of a ring R, and suppose that xy = yx. Prove 
the Binomial Theorem for n > 0: 


n ~ n n-t 4 

(x+y) = (") so ty’, 
i=0 

[As in the preceding question, m- a = a+---+ az (m terms).] 


Exercise 2.7 Let R be a ring with identity element 1. 


(a) Prove that (—1)-a = —a, where —1 and —z are the additive inverses of 1 and «x. 

(b) Show that —(a + y) =-y- «a. 

(c) Hence show that the commutativity of addition can be deduced from the other 
axioms for a ring with identity. 


Show that this is false if no identity element exists. [Hint: Let all products be zero.] 


Exercise 2.8 Let R be a ring in which every element x satisfies 2? = x (where x? 
means x2). 


(a) By evaluating (2 +2)”, show that «+2 =0 for all a € R. 
(b) By evaluating (« + y)?, show that R is commutative. 


Remark Any Boolean ring satisfies the condition x? = x for all x € R. It can 
be shown that any finite ring satisfying this condition is a Boolean ring. 

In the spirit of abstract algebra, we will re-define the term Boolean ring to 
mean a ring R with identity satisfying x? = x for all x € R. 


Exercise 2.9 Let R and S be rings. Define operations on R x S (the set of ordered 
pairs) by the rules 


(r1, $1) + (ra, 2) = (r1 + r2, $1 + 82), 


(71, 81) (r2, 82) = (rire, $182). 


Prove that R x S is a ring. Show further that R x S is commutative if and only if R 
and S are commutative, and that R x S has an identity if and only if R and S do. Can 
Rx S ever be a field? 


Remark R x S is known as the direct product or direct sum of the two 
rings R and S. 


Exercise 2.10 (a) Let R be a ring in which the elements are the integers, and the 
addition is the same as in Z. Is it possible that the multiplication is different from that 
in Z? Can you describe all such rings? 

(b) (**) Let R be a ring in which the elements are the integers, and the mul- 
tiplication is the same as in Z. Is it possible that the addition is different from 
that in Z? 
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(Part (b) is much more difficult than part (a); you are not expected to solve it at 
this stage. We will return to this later.) 


Exercise 2.11 (*) The field C of complex numbers was described in Chapter 1 
as the set of all objects a + bi, where a and 6b are real numbers, where addition 
and multiplication are defined according to ‘the usual rules’ subject to the extra 
condition that i? = —1. Hamilton constructed a larger number system, the quater- 
nions, as follows. The elements of H are all objects of the form a + bi + cj + dk. 
Addition and multiplication are defined according to the usual rules, subject to the 
condition that multiplication of the elements 1,i,j,k works according to the following 
table: 


2k ee ee ee 
a a rs 
Pie, at ee 
Gli: Sele eos 9 
eile 9 ee 


Prove that H is a non-commutative division ring. [Hint: Define the conjugate of 
the element z = a+ bi+ cj + dk to be Z = a — bi — cj — dk.] Prove that 


z= (a? +0° +c? +4?)-1. 


Homomorphisms and ideals 


Two rings are essentially the same for the purposes of algebra (the technical 
term is ‘isomorphic’) if we can match up their elements in such a way that 
addition and multiplication correspond; that is, if a corresponds to a’ and 6 to 
b’, then a+ 6b corresponds to a’ + b' and ab to a’b’. In this section, we define a 
weaker relationship between rings, which merely asserts that they are somewhat 
alike. 


2.5 Cosets. The first topic seems to be a digression, but its relevance will 
become clear soon. Let S be a subring of the ring R. We will partition R into 
subsets called cosets of S. 

Define a relation E on the set R by the rule that (a,b) € Eifb-—aeS.1I 
claim that FE is an equivalence relation. It is 


e reflexive, sincea—a=0€ S; 
e symmetric, since if b—a€ S then a—b=~—(b—a) € S; 
e transitive, since if b—a€ S andc—be S then c—a= (c—b)4+(b—-a) ES. 


So it is indeed an equivalence relation. 

By the Equivalence Relation Theorem, R is partitioned into equivalence 
classes E(a), where E(a) = {b: (a,b) € E}. These equivalence classes are called 
the cosets of S in R. We examine them a bit more closely, and observe that 


E(a)=S+a={s+a:s€ S}. 
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To see this, first take be S+a. Then b=s+a for somes € S,sob—a=se€ES, 
whence (a,b) € E and b € E(a). Conversely, if b € E(a), then b—a € S. Putting 
s=b-—a, we haveb=s+ae€S$+aas required. 


Example Let R= Zand S = 4Z. Then 


S=S+0={...,—8,—4,0,4,8,...}, 
S4VS thy 7,8) 15,9004, 
SES Wy 6 9 96 10 
Ss ae ns eM We Ar mt Ge) 


and S$ +4 = 5S, at which point the sequence repeats. (Compare the example in 
Section 1.4.3.) 


More generally, if R = Z and S = nZ, where n is a positive integer, then the 
coset nZ + a is the set of all integers congruent to a mod n. Thus, the cosets are 
the congruence classes mod n, and there are n of them altogether. 

The element a is called a coset representative for the coset S + a. Note 
that the system is perfectly democratic: any element of a coset can serve as its 
representative. (Actually, this is very slightly misleading in one case. The subring 
S is a coset of itself, namely S +0; and while we could use any of its elements 
as a representative, it is most natural to use the element 0.) 


2.6 Homomorphisms and ideals. I introduce these two slightly strange 
words by means of an example. Suppose that I am short-sighted (actually this is 
correct), and that when I look at an integer I can only see whether it is even or 
odd. I will not know much about the integers, but I will know enough to make 
some consistent statements about addition and multiplication: for example, even 
plus even equals even. My knowledge can be summarised in tables: 


+ | even odd . | even odd 
even | even odd even | even even . 
odd | odd even odd | even odd 


This looks very much like the two-element ring of Example 5 in Section 2.2. The 
point is that it is indeed a ring, and captures a little bit of the ‘shape’ of the 
ring of integers. 

We define a homomorphism from a ring R to a ring S to be a function or 
map 6: R — S which satisfies 


O(a + b) = O(a) + O(0), 
6(ab) = 0(a)6(b) 
for all a,b € R. Note that the addition and multiplication on the left of these 


equations are the operations in R, while those on the right are the opera- 
tions in S. 
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Having said this, I will perversely change notation right away. There are good 
reasons for writing the result of applying @ to a, not as 6(a), but as af. With this 
notation, we say that the map @ is written on the right. One reason is that, in 
algebra, we often have to apply a function 6 followed by another function ¢: this 
is written as a@¢, whereas if we wrote functions on the left we would have to 
say ¢(@(a)), and always remember to reverse the order. From now on, functions 
with an algebraic significance, such as homomorphisms, will always be written 
on the right; while functions with no such significance, such as polynomials, will 
be written on the left. So f(a) = x? +1 is a polynomial, and f6 is the result of 
applying to it some homomorphism from the ring of all polynomials to another 
ring. Confused? Remember that not everybody uses this convention! 

Let us rewrite the definition of a homomorphism in the new notation 
?: R— S is a homomorphism if 


(a+ b)0 = a0 + b0, 
(ab) = (a@)(b6). 


The word homomorphism means ‘similar shape’; this is meant to suggest that 
some of the ‘shape’ of the ring R is captured by S, as in our example. In this 
terminology, if S is the ring {0,1} of Example 5 of Section 2.2, then the function 
6:Z— S defined by 

n= 0 if nis even 
~ | 1. if n is odd, 


is a homomorphism. 

A homomorphism which is also a bijection (a one-to-one and onto function) 
is called an isomorphism. If there is an isomorphism from R to S, we say that 
the rings R and S are isomorphic. This means that they are ‘matched up’ by 
the function @ in such a way that the addition and multiplication are the same. 
So, from the point of view of abstract algebra, we will regard the two rings as 
being the same, even if their actual elements are quite different (one ring might 
consist of matrices and the other of polynomials, say). We denote ‘R and S are 
isomorphic’ by R= S. 

Any homomorphism @: R — S has the additional properties 


06 = 0, 
(a — b)0 = a0 — b0 


for all a,b € R. The first equation follows from 
00 + 06 = (0+ 0)6 = 06 = 00+0 
by using the Cancellation Law in S. The second from the fact that 


(a — b)0 + 10 = (a— b+ b)0 = a0. 
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Our main task in this section is to see just how much a homomorphism blurs 
the structure of a ring, and how much shape is preserved. 


Example We will find all homomorphisms from the ring Z to itself. Let 6 be 
a homomorphism. Suppose that 10 = n. Then 20 = (14+ 1)@=n+n= 2n, and 
similarly (by induction), m@ = myn for all positive integers m. Moreover, 06 = 0, 
and for positive m we have (—m)0 = —m@ = —mn. So @ multiplies every integer 
by n. 

So far, we have only used the additive property. Now we turn to multiplica- 
tion, and observe that 


n=10=10-10=n-n=n’, 


so that n = 0 or n = 1. So there are only two homomorphisms, namely 


e 09: e+ 0 for all az; 
e O:: 2+ 2 for all x. 


In fact, these rules define ‘trivial’ homomorphisms on any ring R. So our favourite 
ring Z is somewhat poor in homomorphisms: the only homomorphisms from Z 
to itself are the trivial ones possessed by all rings. (Of course, as we saw, there 
are homomorphisms from Z to other rings.) 


A homomorphism @: R — S is a function, and so has an image and a kernel 
in the sense of Section 1.18. As promised there, we simplify the definition of the 
kernel slightly. 


Definition Let 6: R — S be a homomorphism of rings. The image of 6 is 
Im(@) = {s € S:s=r6 for some r € R}, 
and the kernel of @ is 
Ker(6) = {re R:r6=0}. 


Remark In Section 1.18, the kernel of a function was defined to be the equiva- 
lence relation in which two elements are equivalent if they have the same image. 
So Ker(6) is the equivalence class containing 0 of the relation KER(6). 


Proposition 2.4 Let @: R— S be a ring homomorphism. 


(a) Im(6) is a subring of S. 

(b) Ker(@) is a subring of R which has the additional property that, for any 
x € Ker(0) andr € R, we have rz, xr € Ker(0). 

(c) Two elements of R are mapped to the same element of S under @ if and 
only if they lie in the same coset of Ker(6). 


Part (c) of this result states that the equivalence classes of the equivalence 
relation KER(@) are precisely the cosets of the subring Ker(6). This is why we 
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use the simpler definition: we can obtain the entire kernel (in the first sense) 
from the subring Ker(@). This also shows an important property of cosets. 


Proof (a) We apply the subring test. Take 51,52 € Im(@). Then s, = 716 and 
82 = 720, for some 71,T2 € R. Then 
81 — 82 = 1710 — 129 = (r1 — r2)0 € Im(6), 

$182 = (710) (r20) = (rir2)0 € Im(6); 

so Im(@) is a subring of S. 
(b) Similarly, take r1,r2 € Ker(@). Then 71@ = r20 = 0; so 
(r1 — r2)9 = 1710 — 129 =0-0=0, 
(rir2)0 = (718) (1728) = 0 : 0 = 0; 


so T1 — T2,71T2 € Ker(@), and Ker(@) is a subring. 
Now we check the extra condition. Suppose that x € Ker(@) and r € R. Then 


(rx)0 = (r0)(x0) = r0-0=0, 


and so rx € Ker(6). Similarly, xr € Ker(6). 

(c) Suppose that 10 = r20. Then (rg — 11)0 = 0, so x = rg — "1 € Ker(6); 
then rp € Ker(@) +11, so r; and rg lie in the same coset of Ker(@). Conversely, 
if r,; and ra lie in the same coset, say rg = ry + x with x © Ker(0); then 
r29 = 1710+ 20 = 110, since x € Ker(6). 


The extra property of the subring Ker(@) is so important that it is given 
a special name. An ideal of a ring R is a subring S of R such that, for any 
s€S andre R, we have rs,sr € S. The term was invented by Kummer, who 
invented ‘ideal numbers’ in an attempt to correct a mistake in his attempted 
proof of Fermat’s Last Theorem. Unfortunately, he did not succeed, and we had 
to wait another hundred years until Fermat’s Last Theorem was proved; but the 
concept of an ideal is crucial for ring theory. 

To test for an ideal, we should test for a subring and then check the extra 
condition. But we can simplify things: 


Theorem 2.5 (Ideal Test) A non-empty subset S of a ring R is an ideal of 
R if and only if 


(a) for all 51,82 € S, we have s, — 82 € S; 
(b) for alls € S andr € R, we have rs,sr € S. 


Proof All that is missing is closure under multiplication; but this is just the 
special case of (b) corresponding to the case in which r € S. 


So we can say more briefly: 


The kernel of a homomorphism 0: R— S is an ideal of R. 


Rings 79 


Example We found in Section 2.4 that the subrings of Z are all the sets of 
the form nZ for n € Z, where nZ consists of all multiples of n. Now all of these 
subrings are ideals. For take r € Z and s € nZ, say s = nx for some x € Z. Then 
rs = sr =n(ra) € nZ. So we have found all the ideals of Z. 


2.7 Should a ring have an identity? What are the reasons for not 
requiring the existence of an identity in a ring? 

First, it is convenient that an ideal is a particular kind of subring. But if rings 
are required to have identities, then all their ideals (with one exception) fail to 
be subrings! (In the next chapter, we will see a close analogy between groups, 
subgroups, and normal subgroups on one hand, and rings, subrings, and ideals 
on the other. This analogy would fail if ideals were not subrings.) 


Proposition 2.6 Let R be a ring with identity element 1, and I an ideal 
containing 1. Then I = R. 


Proof For any element r € R, we have r=1-relTI. 


Second, there are several important examples of rings which are used in 
various branches of mathematics and which do not contain identities. For 
example, 


e Co(R), the ring of continuous real-valued functions with bounded support; 
e the direct sum of an infinite collection of rings (even if all the factors have 
identities). 


Third, the argument that a more general definition is better. In topology, 
there was a debate over whether a topological space should be required to satisfy 
the ‘Hausdorff condition’ or not; this was resolved in favour of the more general 
approach. 

Finally, the traditional defence: if a ring does not have an identity, we can 
put one in! 


Proposition 2.7 Let R be a ring. Then there is a ring R* with identity which 
contains a subring isomorphic to R. 


Proof Let 
R°=RxZ={(r,n): re Rk, ne FZ}, 
and define addition and multiplication on R* by the rules 


(71,71) + (r2, 2) => (ry + 72,4 +n), 


(r1, 71) + (r2,n2) = (rire + nari + Nire,n1N2), 


where the product nr of an integer and a ring element is defined as in Exercise 2.5 
for positive n (that is, the sum of n copies of r), with Or = 0 and (—m)r = —(mr) 
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for n= —m < 0. [Warning: This is not the direct product of rings defined in 
Exercise 2.9.] 
Now a small amount of checking shows that 


e R* is aring; 

(0,1) is the identity of R*; 

S = {(r,0): r © R} is a subring of R* (in fact, it is an ideal); 
e the map r+ (r,0) is an isomorphism from R to S. 


Of course, if R already has an identity element, then this element is no longer 
the identity of R*. 


2.8 Factor rings and isomorphism theorems. For any integer n, the set 
nZ of multiples of n is an ideal of Z (and all ideals are of this form). We saw 
earlier that the cosets of nZ are just the congruence classes mod n. But we can 
do more: we can add and multiply integers mod n. This means that, if we take 
two congruence classes nZ+ x and nZ+ y, then the sum of any integer from the 
first class and any integer from the second will always lie in the congruence class 
nZ + (x+y); and, similarly, the product of integers from these classes will lie in 
nZ+ xy. In this way, we can define operations of addition and multiplication on 
the set of congruence classes mod n (a finite set with n elements). With these 
operations, the set of congruence classes becomes a ring. 

This all works much more generally; for any ideal I of a ring R, it is possible 
to make the set of cosets of J in R into a ring, called the ‘factor ring’ of R by I 
and denoted R/T, as follows. 


Definition Let J be an ideal in the ring R. The factor ring or quotient ring 
R/T is the set of cosets of J in R, with operations of addition and multiplication 
defined by 


(I+a)\I+y)=I4+ xy. 
Theorem 2.8 The factor ring, as defined above, is indeed a ring. 


Proof Before we verify the axioms, there is one very important thing to check: 
that the definition is a good one. On the face of it, the definition depends on the 
choice of coset representatives. It is not clear that, if 2; and x2 are two repre- 
sentatives of a coset of J, and y; and y2 are two representatives of another coset, 
then x; + yi and x2 + yp lie in the same coset (and similarly for multiplication). 
Suppose that we have such elements. Then rg = 4, +a and yg = y, +0, for some 
a,b € I. Then 


(v2 + y2) — (21 +41) = (21 +a+y14b)-(it+y)=at+bdel, 
(xoy2) — (t1y1) = (a1 + a)(y1 +b) — (@1y1) = a1b+ ay, + abe I, 
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where, in the last step, ab € I since J is a subring, and 71b, ay, € I since J is an 
ideal. So the operations on R/I are indeed well defined. 

Now the rest of the proof involves verifying the axioms, which is routine. 
The closure laws need no proof, since we have well-defined operations. For the 
associative law (Al), we have 


(Tf+2)+(74+y))+(7+2)=(74+(e+y))+ (142) 
I+ ((a@+y) +2) 
=I4+(x+(y+2)) 
( 
( 


+a Pays) 
+a)4+(7+y)+(4+2)). 


The proofs of (A4), (M1), and (D) are very similar. The zero element is [+0 = J, 
and the inverse of I + x is I + (—x) (which, of course, we write as I — x). 


You were warned in the last chapter that a set of sets is difficult to think 
about, especially if we have to perform operations on its elements. Here is 
precisely that situation. But it is so important that it is worth taking some 
trouble to grasp the ideas. 

You may recognise that, if R is the ring Z of integers and J is the set mZ of all 
multiples of the positive integer m, then the ring R/T is precisely the structure 
we called ‘the integers mod m’ in Chapter 1, and denoted by Z,,,. We will use 
the same notation here. The integers mod m provide a very good example of a 
factor ring. 

The factor ring comes as the image of a natural homomorphism, as follows: 
Remember that the elements of R/I are the cosets of I in R. Now define a map 
6: R— R/I by the rule that <0 = I+ for all e € R. Checking that @ is 
a homomorphism and is straightforward (we could say that the definitions of 
addition and multiplication in R/I were chosen to make this work): 


(e+ y@=I+(@a@+y)=(4+2)4+(14+y) =264+ y9, 
(zy)@ =I +ay=(1+2)(1+y) = (x0)(y8). 


The image of 6 is R/I, since every coset has the form J + x for some x € R. 
What is the kernel of 9? Since the zero element of R/I is the coset I, we have 


Ker(0) ={weE R:l+ea=T}={r@eRixel=l. 


We call the map @ the canonical homomorphism from the ring R to its factor 
ring R/I. Hence we have proved the following: 


Theorem 2.9 The canonical homomorphism 6: R — R/I defined by x6 = 
I+ for x € R is indeed a homomorphism; its image is R/I and its kernel is I. 


Armed with the concept of factor rings and the canonical homomorphism, we 
can return to our analysis of the image and kernel of an arbitrary homomorphism. 


82 Rings 


Theorem 2.10 (First Isomorphism Theorem) Let 0: R — S be a ring 
homomorphism. Then 


(a) Im(@) is a subring of S; 


(b) Ker(@) is an ideal of R; 
(c) R/Ker(0) = Im(6). 


Proof We have already shown (a) and (b). For (c), there is only one reasonable 
definition of a map ¢ from R/I to S, where I = Ker(@): we must put (I+ 2)¢ = x0 
for alla € R. As usual, we have to show that this is well defined. So let x; and x2 be 
representatives of the same coset of J, so that rz = x; + afor some a € I. Then 


x20 = (41 +a)0= 2710+ 00 = 240, 


since a@ = 0; so (I+ 21)¢ = (I + x2)¢, and ¢ is indeed well defined. 
To show that ¢ is a homomorphism, we have 


(I+ (e+ y))o = (a+ y)0 = 26+ yO = (T+ a)ot+ (+ y)¢, 
(I+ xy) = (xy)@ = (x6)(y0) = (I+ «)@)( + 9) 6). 


Now ¢ is clearly onto Im(6), since for any s € Im(@) we have s = 20 = (I+2)¢. 
Finally, suppose that (I + x)¢ = (I+ y)¢. Then x6 = yO, so (y — x)@ = 0; thus 
y —« € Ker(@), and x and y represent the same coset of Ker(6). 


There are two further ‘Isomorphism Theorems’ relating a ring R to a factor 
ring R/T. 


Theorem 2.11 (Second Isomorphism Theorem) Let J be an ideal of R. 
There is a one-to-one correspondence between the set of subrings of R which 
contain I and the set of subrings of R/I. Under this correspondence, ideals of R 
containing I correspond to ideals of R/T. 


Proof If S is a subring of R containing J, then any coset of IJ with a 
representative in S is completely contained in S. (For, if J C S and x € S, 
then I + x C S by closure of S$.) Moreover, J is an ideal of S, since it is closed 
under subtraction and under multiplication by elements of S. So the factor ring 
S/TI is the set of all cosets of J in S, and is a subring of R/T (as it is a ring in 
its own right). Conversely, let T be a subring of R/I. Then T is a set of cosets 
of I; the union of all these cosets is a subset T of R, which is easily seen to be 
a subring of R containing J. Hence we have the one-to-one correspondence. The 
further statement about ideals is an easy exercise. 


Theorem 2.12 (Third Isomorphism Theorem) Let I be an ideal of R and 
S a subring of R. Then 


(a) 1+ S={a+s:a€1,5s€S} is a subring of R containing I; 
(b) INS is an ideal of S; 
(c) S/INS) = (14+ 8S)/I. 
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Proof All this can be proved directly; but a little trick, based on the natural 
homomorphism 6: R — R/I, makes it easier. We let ¢ be the restriction of 6 to 
S. That is, é maps S$ to R/I, and the value s@ for s € S is just s6. (We simply 
forget how @ acts on elements outside S.) Clearly, ¢ is a homomorphism: the two 
conditions in the definition hold for arbitrary elements of R, and so certainly for 
all elements of S. 


(a) What is the image of ¢? We see that Im(¢) consists of all cosets 
I +s for which the representative is in S. These form a subring of R/I, by 
Theorem 2.10(a). The union of all these cosets is the set 


{fat+s:ae€I,sEeS}=14+S 


by Theorem 2.11; so J+ S is a subring of R which contains J. Incidentally, we 
see that Im(¢) = (I+ S)/I. 

(b) What is the kernel of ¢? We see that Ker(¢) consists of all the elements 
of S mapped to zero by 6. Since Ker(@) = I, we have Ker(¢) = IM S, which is 
thus an ideal of S, by Theorem 2.10(b). 

(c) By Theorem 2.10(c), S/ Ker(¢) = Im(¢); that is, S/N S) = (2+ S)/I, 
as required. 


These theorems are quite abstract, and the proofs are very condensed. Here 
is an example in detail. 


Example Let R = Z, and let I be the ideal 4Z. The cosets of J are the 
congruence classes mod 4. For simplicity, we will write the class 4Z+k as k (being 
careful to distinguish between the integer and the coset). Now the addition and 
multiplication tables of Z, = Z/4Z are as follows (ignore the underlines for the 
moment): 


+|0 1 2 3 a0 a 23 
09/0 1 2 8 0)0 0 0 0 
Cae 3 Sm 101/052. 8 
2/2 3 0 1 2)0 2 0 2 
3, (3: 0” 30: 822) I 


Now 2Z is a subring of Z containing 4Z: the corresponding subring of Z/4Z is 
the set of cosets containing even numbers. These are the underlined cosets in the 
tables above: inspection of the tables shows that we do indeed have closure, so 
that {0,2} is a subring of Z/4Z, as it should be by Theorem 2.11. (Indeed, it is 
an ideal, also in accordance with that Theorem.) 

Let S = 6Z, a subring of R = Z, and I the ideal 4Z. Then 


I+S= {4x + 6y: 2,y € Z} = 2Z, 
the subring of R containing J described above. Also, 4Z296Z = 12Z, since 


an integer is divisible by both 4 and 6 if and only if it is divisible by 12. So 
Theorem 2.12 asserts that 6Z/12Z = 2Z/4Z. The second of these factor rings 
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consists of the underlined elements in the above tables; the first has the following 
tables: 


+|0 6 -|0 6 
0/0 6 0/0 0 
6|6 0 6]0 0 


(Note that 6-6 = 36 =j2 0.) Inspection shows that the two factor rings are 
indeed isomorphic, by the correspondence 0 + 0, 2 < 6. 


2.9 Polynomials. Like sets, polynomials are easy to understand, but diffi- 
cult to define; we must make the attempt. 

Usually, a polynomial is written as a sum of terms, where each term is a 
product of a coefficient and a power of an ‘indeterminate’ x. Traditionally, the 
coefficients are real numbers, and a polynomial is regarded as a function from 
R to R. In keeping with the spirit of abstract algebra, we allow the elements of 
any ring R as coefficients; and we do not care what a polynomial really is, as 
we are only interested in the rules for adding and multiplying polynomials. (In 
fact, over some rings, different polynomials define the same function. We saw in 
Section 2.2 that x? = x for all elements x of a Boolean ring R; so the polynomials 
x and «x? would define the same function.) 

Clearly, a polynomial is specified by giving its coefficients. But even these 
are not uniquely determined. If a polynomial has degree n, we can add to it an 
extra term 0x"*! without changing it. Accordingly, we allow a polynomial to 
have infinitely many terms, but specify that in all but a finite number of them 
the coefficient is zero. 

Now we are ready for the formal definition. 

A polynomial over aring RF is an infinite sequence (do, @1, d2,...) of elements 
of R, indexed by the non-negative integers, with the property that there exists 
an integer n such that a; = 0 for all i > n. In accordance with the usual notation, 
we write the sequence (ag, @1,@2,...) aS dg + a, + aga? +---, or (if n is as in 
the definition) as 07" 9 aja". 

Addition and multiplication of polynomials are defined by the ‘usual’ rules 
(essentially the ones we saw in Chapter 1): 


Om aia’) + 63 bj") = S- cx", where c; = a; + 5;, 
(~~ aia") . BS biz) = s dx", where d; = x ajbi—3- 
j=0 
We let R[x] denote the set of polynomials over R, with the above addition and 
multiplication. 
Theorem 2.13 For any ring R, Ria] is a ring. It is commutative if and only 


if R is commutative; it has an identity if and only if R has an identity; but it is 
never a division ring. 
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The proof involves checking all the axioms; it will not be given here. The 
important thing to note is that the closure laws hold: adding and multiplying 
polynomials produce a sequence which has only finitely many non-zero terms, 
hence again a polynomial. Indeed, if we define the degree deg(f) of a non-zero 
polynomial to be the greatest integer n for which the coefficient of x” is non-zero, 
then we have 


deg(f +g) < max(deg(f),deg(g)), 
deg(fg) < deg(f) + deg(g). 


(We do not define the degree of the zero polynomial, since it has no non-zero 
terms at all. Some people would define its degree to be —oo, while others would 
say —1; but these are mere conventions.) 

A constant polynomial is a polynomial )> a,x’ with a; = 0 for i > 0. In 
other words, the constant polynomials are the zero polynomial and the poly- 
nomials whose degree is zero. They form a subring of R[z] isomorphic to R. 
Often, we don’t distinguish carefully between the ring element r and the constant 
polynomial r = S* a;x* with ag = r and a; = 0 for i > 0. 


Remarks 1. If we consider all the infinite sequences of elements of R, without 
imposing the restriction that only finitely many are non-zero, and use the same 
definitions of addition and multiplication, we obtain another important ring, the 
formal power series ring over R, denoted R[[x]]. (The word ‘formal’ signifies 
that we do not attempt to sum the power series, and are not concerned with 
questions of convergence.) 


2. Here is another definition of polynomials, which avoids the need to consider 
infinite sequences at the expense of another complication. Let X denote the set 
of all finite sequences (ao, @1,...,@n) of elements of R. (The number n can take 
any value. In particular, we include the empty sequence, which has no terms at 
all.) Now we define a relation F on X by the rule that (s,t) € FE if t can be 
obtained from s by either adding or deleting any number of zeros from the right- 
hand end of the sequence. It can be shown that FE is an equivalence relation; its 
equivalence classes are polynomials. To add or multiply polynomials f and g, 
we choose representative sequences s = (do,...,@n) and t = (bo,...,m) from 
the equivalence classes f and g. We may assume that m = n, by adding zeros to 
the shorter sequence. Now define f + g to be the equivalence class of s + t, and 
fg the equivalence class of st, where addition and multiplication of sequences 
is defined as before. It can be shown that these operations do not depend on 
the choice of representatives of the equivalence classes, so that they are well 
defined; and that, with these operations, the set of equivalence classes is a ring. 
Furthermore, this ring is isomorphic to the ring of infinite sequences which we 
defined before. 

3. The upshot of this section is that you already understand polynomials, and 
you should think of them just as you did before; but they can be put on a proper 
theoretical basis, with some work. 


86 Rings 


Exercise 2.12 Let I be an ideal of a ring R. Prove that M,(J) (the set of n x n 
matrices with elements in J) is an ideal of M,,(R). (For an easier question, do the case 
n = 2.) Prove also that Mn(R)/Mn(1) = Mn(R/I). 


Exercise 2.13 Let I be an ideal in a commutative ring R. Prove that [ax] (the ring 
of polynomials over J) is an ideal in R[x]. Prove also that R[x]/I[x] = (R/I)[z]. 


Exercise 2.14 («) Let R be a ring with identity. Suppose that J is an ideal in M;,(R). 
Prove that there is an ideal I of R such that J = M,,(1). [Hint: let Ej; be the matrix 
unit with 1 in row 7 and column j and 0 everywhere else. Prove that, if A = (ai;), 
then E,;AE;1 has entry a;; in the (k,1) position and zeros elsewhere. Now let I be the 
set of all elements of R which appear as an entry in some matrix of J. Show that, for 
any r € J, the matrix with r in the top-left corner and 0 elsewhere belongs to J. Hence 
show that J is an ideal, and that J = M,(J).] 


Exercise 2.15 (a) Show that, in the ring Z, mZ contains nZ if and only if m divides n. 
(b) How many ideals does the ring Zo have? How many of these ideals are maximal 
(in the sense that J is maximal if J # R but no ideal J satisfies IC J C R)? 
(c) Repeat part (b) for the ring Zn, where n = pf}? --- p%" and pi,..., pr are distinct 
primes, and ai,...,a, are positive integers. 


Exercise 2.16 (a) Prove that the Gaussian integers, the complex numbers of the 
form a+ bi, where a, b are integers, form a subring of C. 

(b) Prove that the Eisenstein integers, the complex numbers of the form a + 
b\/—3, where either a,b are integers or a — 7 b- 4 are integers, form a subring of C. 
(So, for example, 1 + /—3 and s { 3/-3 are Eisenstein integers but s —4/—3.is 
not.) 


Exercise 2.17 Let R be a commutative ring and u € R. Show that the map @ : 
R\{z] — R defined by ‘substituting u for 2’; that is, Scaiz’ + Slaju'’, is a 
homomorphism. 


Exercise 2.18 Let R be the ring R[z] of all real polynomials. Define a function 0 : 
R-—C by the rule that f@ = f(i). Prove that 0 is a homomorphism, that its image is 
C, and that its kernel is the ideal (x? + 1)R consisting of all polynomials divisible by 
x* + 1. Hence show that 

R[a]/(? + 1)R[a] & C. 


Exercise 2.19 Construct a homomorphism from Zmn to Zn, for any positive integers 
m,n. 


Exercise 2.20 Let Y be a subset of the set X. Let P(X) and P(Y) be the Boolean 
rings of subsets of X and Y, respectively. Show that the map 0: P(X) — P(Y) defined 
by A@ = AMY is a homomorphism, and find its image and kernel. 

Exercise 2.21 Let R be the ring of real upper-triangular 2 x 2 matrices (those of the 


form (3 for a,b,c € R). Let I be the set of strictly upper-triangular matrices (of 


the form G 0) and S the set of diagonal matrices (of the form & 2 ). 
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(a) Prove that J is an ideal of R. 
(b) Prove that S is a subring of R. Is it an ideal? 
(c) Prove that R/I is isomorphic to S. 


Factorisation 


One of the most important properties of the integers is the so-called Fundamental 
Theorem of Arithmetic, which asserts that any integer can be factorised into 
primes in an essentially unique way. We want to examine the rings in which such 
a result could hold. 


2.10 Zero-divisors and units. In contrast to the situation in Z, it can 
happen in an arbitrary ring that the product of two non-zero elements is zero. 
For example, 2-2 = 0 in Z4, and (a o) G ) = C A) in M2(R). For much of the 
rest of this chapter, we are especially interested in rings in which this does not 


happen. Accordingly, we define it away: 


e A zero-divisor in a ring R is a non-zero element a € R such that there 
exists a non-zero element b € R with ab = 0. 

e An integral domain is a commutative ring with identity which has no 
zero-divisors. 


Strictly speaking, in the first definition, a is a left zero-divisor, and 6 is a right 
zero-divisor; but, as the second definition suggests, we are mostly interested 
in commutative rings, and in these, the concepts of left and right zero-divisor 
coincide. 

The condition ‘no zero-divisors’ can also be stated in the form: if ab = 0, then 
either a = 0 orb = 0. Thus, Z, our prototype of a ring, is also our prototype of an 
integral domain, as the name would suggest. Integral domains have many nice 
properties. For example, there is a multiplicative version of the cancellation 
law: 


(C’) In an integral domain, if ab = ac and a 4 0, then b=. 


For, if ab = ac, then a(b — c) = 0; in an integral domain, if a 4 0, this implies 
that b— c= 0; that is, b= c. 

Let R be a ring with identity. The element a € R is a unit if there exists 
b € R with ab = ba = 1. The element 0 is called the inverse of a, and is written 
a~'. It is unique; for if 6 and ¢ are both inverses of a, then 


c= 1c= (ba)c = W(ac) = D1 = 0b. 


You should compare this with the proof of uniqueness of additive inverses in 
Section 2.3. 


Proposition 2.14 Let R be a ring with identity. 
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(a) The identity is a unit; it is equal to its inverse. 
(b) If a is a unit, then so is a~'; its inverse is a. 
(c) If a and b are units, then so is ab; its inverse is b-ta~t. 


Proof (a)1-1=1. 
(b) This is shown by the equations aa~' = a~'a = 1. 
(c) We have 


(ab\(6" a) = a(bb™ a> ala = aa"! =1, 


and, similarly, (b~'a~')(ab) = 1. 


Here is how Hermann Weyl explains part (c) of this proposition in his book 
Symmetry (1952). 


With this rule, although perhaps not with its mathematical expres- 
sion, you are all familiar. When you dress, it is not immaterial in 
which order you perform the operations; and when in dressing you 
start with the shirt and end up with the coat, then in undressing 
you observe the opposite order; first take off the coat and the shirt 
comes last. 


Examples 1. A ring with identity is a division ring if and only if every non-zero 
element is a unit. 


2. The units in Z are 1 and —1. 


3. In the next proposition, we find the zero-divisors and units in the ring Z,, 
of integers mod n, where n > 1. 


Proposition 2.15 An element x 4 0 of Z, is a zero-divisor if and only if x 
and n have greatest common divisor (g.c.d.) greater than 1; it is a unit if and 
only if x and n have greatest common divisor 1. 


In other words, in Z,, every non-zero element is either a zero-divisor or a 
unit (but not both, see Exercise 2.22). 


Proof Suppose that d= gcd(a,n) is the greatest common divisor of x and n. 

(a) If d > 1, then (n/d) is a non-zero element of Zp; and x(n/d) = 
(x/d)n =» 0. 

(b) Suppose that d = 1. By the Euclidean algorithm, there are integers p and 
q such that xp + nq = 1. But this means that xp = pr =, 1, so that x is a unit 
(and p is its inverse). 

Conversely, if x is a zero-divisor, then x is not a unit, so d is not 1 — that 
is, d > 1 — and similarly for (b). 


Two elements a, b of the integral domain R are said to be associates if there 
is a unit u € R such that b = au. Note that, by the above Proposition, it follows 
that being associates is an equivalence relation: it is 
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e reflexive, since a = al; 
e symmetric, since b = au implies a = bu7!; 
e transitive, since b = au and c = bv imply c = a(uv). 


(Here u and v are units; and the proposition shows that 1, u~+ and uv are units.) 
By the Equivalence Relation Theorem, R is partitioned into equivalence 
classes, called associate classes. 
For example, in Z, the associate classes are the sets {n,—n} for all non- 
negative integers n. 


2.11 Irreducibles and factorisation. In this section, we examine the pos- 
sibility of factorising elements of a ring into ‘irreducible’ elements (which cannot 
themselves be further factorised), and look at a special class of rings in which 
the analogue of the Fundamental Theorem of Arithmetic holds. 

First, we will make some simplifying assumptions about the ring R. We always 
assume that R is commutative, so that we can regard ab and ba as essentially the 
same factorisation of a ring element. (So, in a factorisation, we do not care about 
the order of the factors.) Also, we exclude divisors of zero. For, if ab = 0, then 
ac = a(b+c) for any element c, and there is little chance of unique factorisations. 

Accordingly, we assume, in this section and the next two, that 


R is an integral domain. 


Also, units provide another problem. In Z, we regard 2-3 and (—2) - (—3) 
as ‘essentially the same’ factorisation of 6. More generally, if « = ab, and uis a 
unit with inverse v, then « = (au)(vb), and we want to think of this as the same 
factorisation. We note that a and au are associates, as are b and vb. So we think 
of two factorisations as the same if the factors in one are associates of factors 
in the other. For the same reason, we do not regard units as counting towards a 
factorisation, or we could multiply them ad infinitum. 

This leads us to the appropriate definitions. 


Definition Let R be an integral domain. 


e Anelement p € R is irreducible if p is not zero or a unit, and if, whenever 
p = ab, either a or 6 is a unit (and the other is an associate of p). 
e Ris a unique factorisation domain or UFD if it holds that 
(a) every element other than zero and units can be factorised into 


irreducibles; 
(b) ifp1-+-Pm = 41°+* Qn, where the p, and q; are irreducibles, then m = n, 
and (possibly after re-ordering) p; and q are associates for i =1,...,n. 


Briefly, condition (a) says that factorisations into irreducibles exist, while (b) 
says that they are ‘unique up to order and associates’. (Note that any associate 
of an irreducible is irreducible.) 

So the ‘Fundamental Theorem of Arithmetic’ says that Z is a UFD. (We 
will prove this later, in Section 2.13.) However, things here are a little different 
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from our first view of the FTA. Instead of factorising positive integers into pos- 
itive primes, we factorise arbitrary integers into arbitrary (positive or negative) 
primes—remember that p and —p are associates. 

In a trivial way, a field is a UFD: it does not have any elements which are 
not zero or units! 

One of the most substantial results about UFDs is the following: 


Theorem 2.16 (Gauss’ Lemma) Jf R is a UFD, then R[x] is a UFD. 


The proof of this result will be given in Chapter 7. 

In particular, if F' is a field, then F[a] is a UFD. This will be proved in 
Section 2.13, since it uses techniques similar to those for Z. 

An important property of UFDs is that ‘greatest common divisors exist’. 
In the case of the integers, we interpret the word ‘greatest’ in its usual sense 
for numbers. In general, this is not possible; the greatest common divisor is a 
common divisor which is divisible by every common divisor, in a sense which the 
next definition makes precise. 


Definition Let R be a commutative ring. 


e For a,b € R, we say that a divides b (in symbols, a | b) if b = ac for some 
ce R. 
e The element d is a greatest common divisor or g.c.d. of a and 6 if 
(a) d divides a and d divides }; 
(b) for any e € R, if e divides a and e divides b then e divides d. 


Thus, the greatest common divisor is not necessarily greatest in any absolute 
sense. In an arbitrary ring, two elements may have no greatest common divisor 
at all. 


Theorem 2.17 = (a) In an integral domain, if a divides b and b divides a, then 
a and b are associates. 
(b) In an integral domain, if a and b have a greatest common divisor, then any 
two g.c.ds are associates. 
(c) In a unique factorisation domain, every two elements have a greatest 
common divisor. 


Proof (a) If a= 0 then b = 0 and there is nothing to prove. So suppose not. 
Let 6 = ac and a = bd, then a = acd, so a(1 — cd) = 0. Since a £0 and R is an 
integral domain, cd = 1; so c and d are units, and a and 0 are associates. 

(b) If d; and dz are both g.c.ds of a and 6, then (by part (b) of the definition) 
each divides the other; so they are associates. 

(c) Assume that a and 6 are non-zero and not units. (Can you deal with 
the remaining cases?) Factorise a into irreducibles. Then, up to associates, every 
divisor of a is a product of some of the irreducibles in the factorisation of a. 
(For suppose that a = xy. Factorise x and y into irreducibles. Combining these 
gives a factorisation of a, which must be equal to the given one, up to order 
and associates.) So we find the g.c.d. of a and b by factorising both elements 
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into irreducibles, and taking all the irreducibles which (up to associates) occur 
in both factorisations. 


If this sounds somewhat complicated, it is just a generalisation of the arg- 
ument which says, for example, that the g.c.d. of 24.37.5?.7 and 2°.3°.5.11 is 
22575: 

This method finds the greatest common divisor, in principle. But it is not 
really an algorithm, since it depends on finding the factorisations of a and b, and 
we do not know how to do this in an arbitrary UFD (only that it can be done). 

We conclude this section with an example of failure of the unique factorisation 


property. 


Example Let R = {a+b/—5: a,b € Z}. Then R is a ring, with the usual 
definition of addition and multiplication of complex numbers. Moreover, RF is an 
integral domain. 

We first find the units of R. Let a+ bV—5 be a unit; suppose that 


(a + bV—5)(x + yV—5) = 1. 


Taking the square of the modulus of this equation (and using the fact that 
la + b\/—5|/? = a? + 5b?), we obtain 


(a? + 5b?) (x? + 5y”) =1. 


Since a,b, x,y are integers, the only possibility is b = y = 0, a? = x? = 1, so 
that a = +1. So the units are 1 and —1, and the associates of an element r are 
rand —r. 

Consider the equation 


6=2-3= (1+ V—5)(1— V—5). 
We claim that all of the factors 2,3,1 + /—5,1—/—5 are irreducible. Then 


certainly the factorisations are not the same up to order and associates! 
To show that 2 is irreducible, suppose that 


2 = (a+ bV—5)(a + yV—5). 
Taking the norm squared as before, we obtain 


A= (a? + 5b?) (a? + 5y?). 


As before, this implies that b = y = 0, so that a = +1 or +2, and x = +2 or +1. 
So one factor is a unit, and the other is an associate of 2. So 2 is irreducible. By 
a very similar argument, all the other factors are irreducible too. 

So R is not a unique factorisation domain. 
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2.12 Principal ideal domains or PIDs. In the ring Z, every ideal consists 
of all multiples of a fixed integer. This is a very important property, which we 
now study in general. 


Definition Let a,,...,a, be elements of a ring R. The ideal generated by 
@1,---,;@,, denoted by (a1,...,@,), is the smallest ideal containing these ele- 
ments. (‘Smallest’ has the sense of inclusion; it is a subset of every ideal that 
contains a1,...,@.) Be aware that this is often written as (a1,...,@,). However, 
this risks confusion with the n-tuple (a1,...,a@n). Angle brackets as used here 
are very common in mathematics to convey the idea of generation. 


From the definition, it is not obvious that such an ideal exists. It can be 
shown that it does exist in any ring. But in a special case, it is easy to describe: 


Proposition 2.18 Let R be a commutative ring with identity, and let aj,...,@n 
be elements of R. Then 


(Q1,---,Qn) = {€101 + LQaQ ++++ + FnGn : 21,...,0n € Rh. 


Proof Let J = {a1a1 +--+ + Undn : 11,.-.,%n € R}, the set of all linear 
combinations of a1,...,@,. We have to show that J is an ideal, that J contains 
@1,.-.,@,, and that any ideal containing a1,...,@, necessarily contains all of J. 

I is an ideal: (a) ifa,b € I, say a = 21a, +--+ +2pay and b= yrar+:--+YnGn, 
then 


a—b= (a1 —yi)art+-+++ (an — Yn)an € I. 
(b) Ifa = 2101 +--+ ana, € I and r € R, then 


ar = ra = (ray)ay +--+ + (ran) dy € I. 


So I passes the Ideal Test. 
I contains ay,...,@n: for 1 <<i<n, we have 


a; = 0a, + +--+ Oaj_-1 + la; + Oai41 +---+0an € I. 


Any ideal containing a1,...,@n contains I: Let J be an ideal of R containing 
@1,-.--,@y. For any 2,...,%» € R, we have x;a; € J, and hence x,a, +--+ + 
Lndn € J (using the fact that J is an ideal, and so is closed under addition and 
under multiplication by elements of R). So every element of I is in J, which 
means that I C J. 


An ideal of R is principal if it is generated by a single element. By the 
proposition, if R is a commutative ring with identity, then a principal ideal is of 
the form (a) = aR = {ax: x € R}. In other words, (a) consists of all elements 
divisible by a. In an integral domain, principal ideals have some further nice 
properties: 


Proposition 2.19 Let R be an integral domain. 
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(a) For a,b€ R, if (a) = (b), then a and b are associates. 
(b) For a,b € R, if (a,b)=(d), then d is a greatest common divisor of 
a and b. 


Proof (a) Suppose that (a) = (b). Then a € (b) = bR, so b divides a, and 
similarly a divides b. By Theorem 2.17, a and b are associates. 

(b) Suppose that (a,b) = (d). Then a,b € (d), so (as above) d divides both 
a and b. On the other hand, d € (a,b), so d = ax + by for some x,y € R. Let e 
be any common divisor of a and b. Then a = eu and b = ev for some u,v. Then 
we have d = ax + by = eux + evy = e(ux + vy); that is, e divides d. So dis a 
greatest common divisor. 


Definition A principal ideal domain or PID is an integral domain with the 
property that every ideal is principal. 


Proposition 2.20 (a) Let R be a PID. Then any two elements a,b € R have 
a greatest common divisor d, which can be written in the form d = ax + by 


for some x,y € R. 
(b) Z is a PID. 


Proof (a) The ideal (a, 6) is principal, and hence has the form (d) for some d. 
Now apply part (b) of the previous proposition. 

(b) We found that every ideal of Z has the form nZ, that is, (n) in the present 
notation. 


What has all this to do with factorisation? The following important result 
holds: 


Theorem 2.21 Every principal ideal domain is a unique factorisation domain. 


Proof Let R bea PID. We have to show two things: that elements of R can be 
factorised into irreducibles; and that the factorisation of an element is unique (up 
to order and associates). The first part is quite substantial; the proof is deferred 
until Chapter 7. I will show here that factorisations are unique. This depends on 
the following fact. 


Proposition 2.22 Let p be an irreducible element in a PID R. If p divides ab, 
then p divides a or p divides b. 


Proof Suppose that p divides ab but p does not divide a. Then the greatest 
common divisor of p and a is 1. (Remember that g.c.ds exist in a PID.) So there 
exist x,y € R such that px + ay = 1. Multiplying this equation by b, we obtain 
pxb + aby = b. Now p clearly divides pxb; and p divides aby (since p divides ab 
by assumption); so p divides b. The result is proved. 


It follows that if p is irreducible and p divides a,---a,, then p divides a, for 
some 2. 
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This property fails in the ring R = {a+ b/—5 : a,b € Z} discussed in the 
preceding section. For 2 divides 6 = (1 + /—5)(1— /—5), but 2 does not divide 
either factor. 

Now we return to the proof that a PID is a UFD. Suppose that we have two 
factorisations of an element of the PID R, say 


a = Pip2°**Pm = 4192°°* An; 


where the p; and q; are irreducible. Now p, divides q, ---qn, so by the remark 
following the proposition, p; divides q; for some j. By re-ordering the product, 
we can assume that p; divides q;. Since p; and q; are irreducible, they must be 
associates, say pj = q,u for some unit u. Then we have 


P1p2°** Dm = (qiu)(u-*q2)*+* dn: 
By the Cancellation Law, 
D2°**Pm = %5°** Ins 


where gh = u—‘qa, an associate of gz. Continuing in this manner we find that 
m =n and that p; and q; are associates for all i (after suitable re-ordering), and 
we are done. 


We close this section with an example of a UFD which is not a PID. This is 
the ring Z|] of polynomials over the integers. It is a UFD by Gauss’ Lemma. The 
g.c.d. of the elements 2 and zx is obviously 1; but there do not exist polynomials 
f and g such that 2f(x2) + xg(x) = 1, since the constant term of the left-hand 
side is even. Said otherwise, the ideal (2,7) generated by 2 and x (which is the 
set of all polynomials whose constant term is even) cannot be generated by a 
single element. 


2.13 Euclidean domains or EDs. We now look at an even more specialised 
class of rings (which, however, includes our prototype Z as well as polynomial 
rings over fields). 


Definition Let R be a commutative ring with identity. 


e A Euclidean function on R is a function d from the set of non-zero 
elements of R to the non-negative integers which satisfies 
(a) d(ab) > d(a) for a,b 4 0; 
(b) if a,b € R with b ¥ 0, then there exist g,r € R with a = b¢4+r and 
either r = 0 or d(r) < d(b). 
e Risa Euclidean domain, if there exists a Euclidean function on R. 


Examples 1. Z is a ED. Take d(a) = |a| for non-zero a € Z. If also b 4 0, then 
clearly 
d(ab) = |ab| = |a| - |b] = d(a)d(b) > d(a), 
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since d(b) > 1. For (b), suppose that b 4 0. If b > 0, divide a by b to obtain a 
quotient q and remainder r; then a = bg+r with 0 <r < 5b; that is, r = 0 or 
d(r) < d(b) = b. If b < 0, divide a by —6 instead. 

2. For any field F, the polynomial ring Fa] is a ED. In this case, we take 
the Euclidean function to be d(f) = deg(f), the degree of the polynomial f. 
(Recall that we did not define deg(f) if f = 0, but we do not need a value for 
d(0) either.) 

(a) If f,g are non-zero, then d(fg) = d(f) + d(g) => d(f). 

(b) Suppose that g 4 0; we wish to find g,r with f = gg+r and r=0 or 
deg(r) < deg(g). The proof is by induction on the degree of f. Let m = deg(f), 
n = deg(g). If m < n, we can take gq = 0 and r = f. So suppose that m > n. Let 


f(x) = ama" + lower terms, 


g(x) = b)x" + lower terms, 
where a,, and by, are non-zero. Put 
filz) = f(2) — (Gnby*)a”~"9(2). 


(This is defined since the coefficients form a field and b, 4 0.) The coefficient of 
x™ in fi is Gm — (Amb, ')by = 0, and clearly there are no terms of higher degree. 
So deg(f1) < m = deg(f). By the induction hypothesis, we have f; = gq +11, 
where r; = 0 or deg(r1) < deg(g). Then 


f = 9(amb,'2™—" +1) +11; 
so we can take g=amb,'2” " +q,r=11. 


The reason for the term ‘Euclidean domain’ is that this is the class of rings in 
which the Euclidean Algorithm for finding the greatest common divisor of two 
elements can be made to work. We met the Euclidean Algorithm for integers in 
Chapter 1. The general case is exactly the same. I will present it here in a way 
influenced by computer programming, as a recursive algorithm. But you do not 
need to know anything about computers in order to follow this. Just remember 
that an algorithm takes some data as input and produces some other data as 
output; we must specify what the algorithm is expected to do, and then we must 
prove that the algorithm really does what is claimed. 


Euclidean Algorithm Let R be a Euclidean domain. 

Input: Two elements a,b € R. 

Output: An element c € R which is a greatest common divisor of a and b. 
We write c = gcd(a, b) for this output. 

Operation: If b = 0, then set gcd(a, b) = a. 

Otherwise, choose g,r € R such that a = bq +r with either r = 0 or d(r) < 
d(b); set gcd(a, b) = ged(b,r). 
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It is not clear that we have defined anything: why should gcd(b,1r) be easier 
to calculate than gcd(a,b)? Imagine that we are given two elements a and 8, 
and are trying to find gcd(a,b). If b = 0, we obtain immediately the result a. 
Suppose not. Now observe that either r = 0, in which case we finish at the second 
step, or we have to calculate gcd(b,r) with d(r) < d(b). During the calculation, 
the second alternative can only occur finitely often, since the value of d on the 
second argument of the function is strictly smaller at each instance, and a strictly 
decreasing sequence of non-negative integers cannot continue for ever. So, after 
a finite number of steps, the algorithm does terminate and produce a result. 

Now, we have to show that it gives the correct result. The proof is by induction 
on d(b) (taking the base case of the induction to be b = 0). In order to do this, 
we need to show two things: 


(a) the greatest common divisor of a and 0 is a; 
(b) if a = bq+r, then the greatest common divisor of a and 6 is equal to the 
greatest common divisor of b and r (up to associates). 


The first fact should be clear if a 4 0; you should think about it and convince 
yourself that it also holds if a = 0. (Use the definition of greatest common divisor, 
rather than any prejudices about greatest integers, etc.) 

For the second point, observe that any divisor of a and 6 also divides r = 
a — bq, while any divisor of 6 and r also divides a = bq +r. So the set of all 
common divisors of a and 0 is the same as the set of common divisors of b and 
r, and the greatest common divisors must be associates, as required. 


As we saw in case of the integers, the Euclidean Algorithm has another fea- 
ture; it can be used to express the g.c.d. of a and 6b as a linear combination of 
these two elements. This is also true in general: 


Enriched Euclidean Algorithm Let R be a Euclidean domain. 

Input: Two elements a,b € R. 

Output: An element c € R which is a greatest common divisor of a and 8, 
together with two elements x and y such that c = ax + by. 

Operation: If b=0, then we set c=a,x=1, y=0. 

Otherwise, we write a = bqg+r with r = 0 or d(r) < d(b) as usual, and apply 
the algorithm to b and r. Suppose that the output is c’, x’, and y’. Then we put 
c=c,v=y',andy=2'—-y’¢. 


The proof that this algorithm terminates, and finds the g.c.d. correctly, is 
exactly as for the original version. We have to show that c = ra + yb. This is 
obvious in the first case of the algorithm. In the second case (arguing, as before, 
by induction), we may assume that c’ = ba’ + ry’. Then 


c=cl = be’ + (a—bq)y’ = ay! + W(x" — ay’), 


as required. 
Now we turn to some theoretical properties of Euclidean domains. 


Proposition 2.23 (a) A Euclidean domain is an integral domain. 
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(b) If R is a Euclidean domain and a,b are non-zero elements with a | b and 
d(b) = d(a), then a and b are associates. 


Proof (a) Ifa and 6 are non-zero, then d(ab) > d(a), so ab £ 0. 

(b) By condition (b), a = bg +r, where r = 0 or d(r) < d(a). Suppose that 
r #0. Then a divides b, so a divides r = a — bq; by condition (a), d(r) > d(a), a 
contradiction. So we must have r = 0, whence a = bq. So each of a and 6 divides 
the other, and these elements are associates. 


Now we come to the main result: 


Theorem 2.24 (a) A Euclidean domain is a principal ideal domain. 
(b) A Euclidean domain is a unique factorisation domain. 


Proof (a) Let R be a Euclidean domain. Take any ideal I € R: we have to 
show that I is a principal ideal. The argument is similar to our determination 
of the ideals in Z. First, 0 € J; and, if J consists only of 0, then J = (0), and 
so I is principal. So we may suppose that J contains some non-zero elements. 
Choose an element a € I such that d(a) is as small as possible. (This depends 
on the fact that the values of d are non-negative integers, so there is necessarily 
a smallest one.) We claim that J = (a). As usual, we have to show that any 
element of either set is contained in the other. First, take 2 € (a); then « is of 
the form x = ar for some r € R, and so x € I (since a € I and J is an ideal). 
Conversely, take x € I. By part (b) of the definition of a Euclidean function, 
we write x = ag+r, where r = 0 or d(r) < d(a). Now x € I and aq € I, so 
r = x — ag € TI; and, since a was chosen as an element of J with d(a) as small 
as possible, it cannot happen that d(r) < d(a), so we must have r = 0, and 
x = aq € (a). Thus indeed I = (a). 

(b) If we had proved Theorem 2.21, this would immediately follow from (a). Since 
we didn’t do that, we have some work to do. We showed that factorisation is 
unique (up to order and associates) in any PID; so we only have to do the other 
part, to prove that any element (other than zero and units) has a factorisation 
in R. So take a € R with a 4 0 and a not a unit. We show by induction on d(a) 
that a has a factorisation. In other words, we assume that any element b with 
d(b) < d(a) has a factorisation. 

If a is irreducible, then we have a factorisation (with only one factor!), so 
suppose that a = bc, where neither 6 nor c is a unit. Now by condition (a), 
d(b) < d(a) and d(c) < d(a). If d(b) < d(a) and d(c) < d(a), then by the inductive 
hypothesis, both b and c have factorisations; combining these gives a factorisation 
of a. So we can suppose that d(b) = d(a). But then a and b are associates, by 
part (b) of the Proposition: a contradiction. The proof is finished. 


Let us summarise our findings. We have three classes of integral domains: 


e unique factorisation domains; 
e principal ideal domains; 
e Euclidean domains. 
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Theorems 2.21 and 2.24 show that 
ED = PID = UFD. 


We see the increasing strength of these conditions by looking at the facts about 
greatest common divisors: 


e Ina UFD, any two elements have a g.c.d.; 

e Ina PID, any two elements a,b have a g.c.d. d, and d = xa + yb for some 
ZY; 

e In a ED, any two elements a,b have a g.c.d. d, and d = xa + yb for some 
x,y; moreover, d,x,y can be found by using the Euclidean Algorithm. 


You might expect here an example of a PID which is not a ED. Such rings 
do exist; T. S. Motzkin showed that the ring 


R= {a+b/-19: either a,b € Zor a— 3,b—$ € Z} 


is an example. But the proof is more difficult. (You can read it in volume 55 of 
the Bulletin of the American Mathematical Society, starting on page 1142.) 


Exercise 2.22 Show that, in a commutative ring with identity, no element can be 
both a zero-divisor and a unit. 


Exercise 2.23 Write down the associate classes in the ring Zi2. 


Exercise 2.24 (x) Find all positive integers m with the property that every unit a in 
Zim, satisfies a2 = 1. 


Exercise 2.25 Let R be an integral domain. Show that the units of R[z] are precisely 
the constant polynomials which are units of R. 


Exercise 2.26 (a) Let F be a field. Show that a matrix A € M,(F) is a unit if 
and only if det(A) 4 0; and that a non-zero matrix A is a zero-divisor if and only if 
det(A) = 0. 

(*) (b) More generally, let R be a commutative ring with identity. Prove that 
A € M,,(&) is a unit if and only if det(A) is a unit in R. [Hint: If det(A) is a unit, 
use the ‘cofactor formula’ to find an inverse of A. Conversely, if AB = BA = I, take 
determinants to show that det(A) is a unit.] 


Exercise 2.27 Let x be an element in a ring with identity, and suppose that x” = 0 for 


some positive integer n. Prove that 1+ is a unit. [Hint: (1+2)(1—a2+a?—-a?+---)=1)] 


Exercise 2.28 (a) Find the greatest common divisor of the real polynomials f(x) = 
x? + 3ae4+2 and g(x) = 2° + 2a4 4+ 5x? + 6a 42. 
(b) Give a simple description of the ideal (f(x), g(x)) of R[z]. 


Exercise 2.29 (a) Show that the ring R = {x + yi: a,b € Z} of Gaussian integers is 
a Euclidean domain, with Euclidean function d(x + yi) = x? + y*. [Hint: For (b), take 
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a,b € R with b 40, and write a/b = w+ vi in C, where wu and v are rational numbers. 
Then choose m,n € Z such that 


\(u+ vi) — (m+ ni)| < 1/v2, 


by considering lattice points with integer coordinates in the complex plane.] 
(*) (b) Show that the ring of Eisenstein integers (Exercise 2.16(b)) is a Euclidean 
domain, by using the triangular lattice similarly. 


Exercise 2.30 (a) Describe the units in the ring of Gaussian integers. 

(*) (b) Show that the irreducibles in the ring of Gaussian integers are of two types: 
primes p € Z which cannot be written as the sum of two integer squares; and elements 
x+yi, where «? + y? is a prime in Z. (For example, 3 is a ‘Gaussian prime’; 5 is not, 
since 5 = (2+ i)(2 —i), but the two factors are both Gaussian primes.) 


Remark A theorem of Number Theory asserts that a prime p can be expressed 
as the sum of two squares if and only if p = 2 or p=, 1. 


Fields 


Recall that a field is a commutative ring with identity in which division is pos- 
sible by non-zero elements. Rings are easy to build: we have seen polynomial 
rings, matrix rings, Boolean rings, cartesian products. Almost always, these rings 
turn out not to be fields. In fact, there are only two standard methods of con- 
structing fields: applied to the integers, they produce the rationals, and the 
integers mod p. 


2.14 Field of fractions. The first method involves going from a ring to its 
‘field of fractions’, which generalises the construction of the rational numbers 
from the integers. 

Let R be an integral domain. A field F’ is a field of fractions of R if 


(a) Ris a subring of F; 
(b) Any element of F can be written in the form ab~! for some a,b € R (where 
b~! is calculated in F). 


For example, the rational numbers Q form the field of fractions of the integers 
Z. Any field is its own field of fractions. 


Theorem 2.25 Any integral domain has a field of fractions. 


Proof Let FR be an integral domain. We let S be the set of all ordered pairs 
(a,b) for a,b € R, b £ 0. We intend that the ordered pair (a,b) will represent 
the element ab~!. But, of course, ab~' = cd! if (and only if) ad = bc; so we 
want the ordered pairs (a,b) and (c,d) to represent the same element of F if this 
condition holds. Accordingly, we define an equivalence relation ~ on S by the 
rule 

(a,b) ~ (c,d) if and only if ad = bc. 
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We prove that this really is an equivalence relation. First, (a,b) ~ (a, b), since 
ab = ba (R is commutative). Then, if (a,b) ~ (c,d), then ad = bc, and so cb = da; 
this means (c,d) ~ (a,b). Finally, suppose that (a,b) ~ (c,d) and (c,d) ~ (e, f). 
Then ad = bc and cf = de. So 


adf = bcf = bde, 


and by the cancellation law (since d 4 0) we deduce that af = be; so (a,b) ~ 
(e, f). So ~ really is an equivalence relation. 

Now we let [a, b] denote the equivalence class of the ordered pair (a, b): [a, 6] = 
{(c,d) : ad = bc}. Let F be the set of equivalence classes. We define addition 
and multiplication on F' by the following rules: 


(a, b] + [c, d] = [ad + bc, bd], 
[a,b] - [c, d] = [ac, bd]. 


(To see where these definitions come from, work out what you would expect 
(ab~') + (cd~') and (ab~!)(cd~') to be, by the usual rules for fractions.) 

We have to check that these operations are well defined, and that they do 
indeed make F' a field. All of this is straightforward checking. Finally, the map 
that takes a to the equivalence class [a, 1] is a one-to-one homomorphism from R 
to F’, so we can regard FR as a subring of F’. Moreover, the inverse of the element 
[b, 1] is [1, }] if 6 4 0; and [a,b] = [a, 1][b, 1]~+. So, if we identify R with its image 
in F' under this embedding, we see that F' is indeed a field of fractions of R. 


In fact, the field of fractions is unique (up to isomorphism). This is another 
way of saying that the only possible way to construct a field of fractions is the 
way we actually did it. 


2.15 Maximal ideals and fields. The second method of constructing fields 
generalises the passage from the integers to the integers modulo a prime: the field 
is constructed as a factor ring. To study this, first we need a different test for 
when a ring is a field. 


Proposition 2.26 Let R be a commutative ring with identity. Then R is a 
field if and only if the only ideals in R are {0} and R itself. 


Proof For the forward implication, suppose that R is a field. Take any ideal 
I of R. Suppose that I 4 {0}; we have to show that J = R, that is, that every 
element of R is in J. Certainly, some non-zero element is in I, say a € I. Now, 
for any x € R, we have x = (za")a € I. SoT=R. 

For the converse, let R be a commutative ring with identity whose only ideals 
are {0} and R. We have to show that all its non-zero elements have inverses. So 
take a € R with a # 0. Let I be the ideal (a2) = aR. Then I ¥ {0}, since 
a€I;sol = R. Thus, 1 € I = aR, so there exists b € R with ab = ba = 1, as 
required. 
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We say that the ideal J of the ring R is a maximal ideal of Rif Il AR 
but there is no ideal J properly between J and R; that is, if J is an ideal with 
ICJCR, then J=lJorJ=R. 


Theorem 2.27 Let R be a commutative ring with identity, and I an ideal of 
R. Then R/T is a field if and only if I is a maximal ideal of R. 


Proof This follows immediately from the proposition and the correspondence 
between ideals of R/I and ideals of R containing J given by the Second 
Isomorphism Theorem (Theorem 2.11). 


How do we recognise maximal ideals? 


Proposition 2.28 Let R be a principal ideal domain, and take a € R with 
a#0. Then (a) is a maximal ideal of R if and only if a is irreducible. 


Proof In an integral domain, we have (a) C (b) if and only if b divides a, and 
(a) = (b) if and only if a and b are associates. In particular, (a) = R if and only 
if a is a unit (an associate of 1: note that (1) = R). Hence (a) is maximal if and 
only if every element b which divides a is either an associate of a or a unit; but 
this is exactly the condition that a is irreducible. Finally, if R is a principal ideal 
domain, then there are no other ideals to spoil the maximality of (a). 


Example R = Z. We see that Z, is a field if and only if n is prime. (In fact 
we knew this already. For we showed that m is a unit in Z,, if and only if m and 
nm are coprime; and this holds for all non-zero residues mod n if and only if n is 
prime.) 


We will apply this result to polynomial rings in the next section. 


2.16 Field extensions, finite fields. The standard procedure for 
constructing the complex numbers from the real numbers is to ‘adjoin’ a square 
root of —1; that is, an element i satisfying i? + 1 = 0. We will now describe this 
procedure, ‘adjoining the root of a polynomial’, in more detail. 


Theorem 2.29 Let F be a field, and f a polynomial which is irreducible in 
F(a]. Then there is a field K containing F and an element a satisfying f(a) = 0. 


Proof The construction is simple. We set K = F[ax]/(f). This is a field by the 
results of the last section: F'[a] is a principal ideal domain and we are given that 
f is irreducible, so (f) is a maximal ideal in Fa], and F[z]/(f) is a field. 

We have to show that 


(a) KK contains (a field isomorphic to) F; 
(b) K& contains a root a of f. 


(a) Set I = (f). For a € F, let @ denote the coset I +a, and let F be the set 
of all such cosets. We show that the map a +> @ is an isomorphism from F' to F. 
It is one-to-one, since if @ = b then b—a € (f), so b— a = 0 (as any non-zero 
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element of (f) has degree at least as great as that of f). The homomorphism 
property is clear. 


(b) We take a to be the coset J + x. Let 


f(z) = anz” + +++ + a1 + ao. 


Then 
Fla) =aa(I ba)" +--+ (E+ 2) + ag(E +) 
= (I+ a)2") +---+ (2+ 4,2) + (I+ ao) 
=I4+(anu" +--+ +a," + a9) 
= 1+ Fla) 
=I 


and we are done, since J is the zero element of F'[x]/I. 


Our proof shows that a field with the required properties exists. However, it 
is defined as a factor ring, which is not the most convenient form for calculation. 
As always, calculation in a factor ring is very much easier if we can make a good 
choice of coset representatives! 

Let f be an irreducible polynomial of degree n > 0 over the field F’. We claim 
that every coset of the ideal (f) in F[z] has a unique representative r satisfying 
r = 0 or deg(r) <n. Such an r exists because of the Euclidean property of Fz]. 
(If g is any polynomial, and g = fq+r, then g and r differ by a multiple of f, and 
so (f) +g = (f) +r.) If r, and rg are two representatives of the same coset with 
r; = 0 or deg(r;) < n for ¢ = 1,2, then (f) +r) = (f) + r2, so r1 — re € (f); this 
means that f divides r; —rp. But since deg(f) = n, this implies that r1 —r2 = 0, 
so Tr] = 192. 

Moreover, a simple argument (similar to the one in the above proof) shows 
that the coset (f) +7 is equal to r(a), where a is the coset (f) + 2. 

This means that 


Every element of K = F[x]/(f) can be uniquely expressed in the 
form 

cot cia+ mo? +t eco", 
where cg, C1,---;Cn—-1 € F. 
The addition and multiplication in K are given by the usual 
arithmetic rules, with the added condition that f(a) = 0. 


The construction of C = R{z]/(a? + 1) is a familiar example: every complex 
number is uniquely expressible as cy + ci, where i? = —1. 

For another example, let us construct a finite field of order 4. 

We start with the field F = Zo, with elements 0 and 1. Consider the polyno- 
mial x7 +2+1 € F{z]. This polynomial is irreducible, since the only possible 
factorisation would be into two linear factors, which would imply that the 
polynomial has a root in F; but 0?+0+1 4 0 and 17+1+1 4 O. Let 
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K = F[a]/(a?+a+1), and let a be the coset (x?-+2+1)+<2, so that a?+a+1 = 0. 
The field K has four elements: K = {0,1,a,a+ 1}. (This is our canonical rep- 
resentation above.) Letting @ = a+1= a? (noting that x = —z in the field K), 
we obtain the following tables: 


+|0 1 a B -|[0 la @ 
0};0 1a 6 0/0 0 0 O 
1/1 0 6B @ 1/0 1a B 
ala 6B 0 1 al0O ap l 
B\|CB a@ 1 0 B\0 6 1a 


If p is a prime, and n a positive integer, then a finite field of order p” can 
be constructed in the same way if an irreducible polynomial of degree n over Zp 
can be found. Galois showed that this is always possible: 


Theorem 2.30 For any prime number p and any positive integer n, there is an 
irreducible polynomial of degree n over Zp, and hence a finite field of order p”. 


Exercise 2.31 Show that the polynomial x? + 1 is irreducible over Z3, and hence 
construct a field of order 9. 


Exercise 2.32 Show that the polynomials x? +a2+1 and #?+27+1 are both irreducible 
over Z. Are the corresponding fields of order 8 isomorphic? 


Exercise 2.33 Show that, if F is a field with q elements and f an irreducible 
polynomial of degree n over F’, then the field K = F[z]/(f) has q” elements. 


Exercise 2.34 Prove that any two fields of fractions F, and F2 of an integral domain R 
are isomorphic, where the isomorphism @ : F, — F» can be chosen so that its restriction 
to the subring R of F{ is the identity map. 


Exercise 2.35 A subset X of a ring R is called multiplicatively closed if a,b € X 
implies ab € X. 


(a) Prove that R is an integral domain if and only if the set of non-zero elements 
of R is multiplicatively closed. 

*(b) Let R be a commutative ring with identity, and let X be a multiplicatively 
closed subset of R containing 1 but not 0. Define an equivalence relation ~ on R x X 
by the rule that (a,b) ~ (c,d) if and only if ad = bc. Define operations of addition and 
multiplication on the set F' of equivalence classes of ~ as in Section 2.14. Prove that F 
is a ring containing R, in which every element of X has an inverse, and every element 
of F can be written as ab-', where a € Rand bE X. 


Appendix: Miscellany 
We end with some miscellaneous topics. 


2.17 Cageonzero. The American composer John Cage wrote the following. 
What is he talking about? (Think about this before reading the following 
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Curiously enough, the twelve-tone system has no zero in it. Given 
a series: 3, 5, 2, 7, 10, 8, 11, 9, 1, 6, 4, 12 and the plan of obtaining 
its inversion by numbers which when added to the corresponding 
ones of the original series will give 12, one obtains 9, 7, 10, 5, 2, 
4, 1, 3, 11, 6, 8 and 12. For in this system 12 plus 12 equals 12. 
There is not enough of zero in it. 


John Cage (1968). 


I contend that Cage is confusing two different zeros, the zero element of the 
real numbers and the zero element of the integers mod 12. 


Real numbers Cage was very much attracted to the Zen concept of emptiness. 
One of his most famous compositions, entitled 4’33”, involves a pianist sitting 
at the keyboard of a piano for 4 minutes and 33 seconds without striking a note; 
the audience notices the background noise (since no emptiness is truly empty). 
The real numbers represent sound intensity, so zero is the absence of sound. 


Integers mod 12 Musical notation is based on the fact that notes an octave 
apart (that is, when the frequency of one is double that of the other) have a very 
similar subjective effect in melodic terms. So we regard such notes as ‘equivalent’. 
More generally, two notes are equivalent if they are a whole number of octaves 
apart. 

In Western music, only a discrete set of notes is used. The octave is divided 
into 12 intervals called semitones. Thus, the semitones appear (on a keyboard, 
say), stretching to infinity in both directions like the integers. As above, two 
semitones are equivalent if they differ by a whole number of octaves; that is, 
if (as integers) they are congruent mod 12. So the musical scale, for thematic 
purposes, has the structure of the integers mod 12. Various musical operations 
fit into this framework. For example, transposition just involves adding a fixed 
constant to each note. Inversion involves replacing each equivalence class by its 
negative. (This is what Cage describes.) 


Two kinds of zeros The equivalence classes referred to are the congruence 
classes mod 12, that is, the cosets of 12Z in Z. We can make any choice of coset 
representatives we like. Mathematicians usually use 0,1,2,...,11. Musicians use 
12 instead of 0 as the representative of the class 12Z, so that their semitones are 
labelled 1,2,3,...,12. 

Now Cage’s arithmetic checks, since —3 = 9, —5 = 7, and so on, in Zj2 (the 
integers mod 12). The mathematician says —0 = 0, the musician —12 = 12; it is 
exactly the same, just involving a different choice of coset representative. 

So, contrary to what Cage says, there is a zero in the twelve-tone scale (but 
musicians call it 12); and it has nothing to do with the real number zero, the 
zero of intensity or absence of sound when the pianist is not striking the keys. 
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Footnote The title of Cage’s piece mentioned above itself blurs the categories 
between different kinds of numbers. Four minutes and thirty-three seconds make 
273 seconds; and —273 is the temperature of absolute zero in the Celsius scale. 
If he had used the Fahrenheit scale, Cage would presumably have titled his piece 
7'39”; but this duration may have taxed the patience of the audience too far! 


2.18 Solution to Exercise 2.10. Exercise 2.10 asks whether it is possible 
to have a ring whose elements are the integers, with (a) the same addition as Z 
but different multiplication, or (b) the same multiplication but different addition. 

Part (a) is easy if you remember the definition of a zero ring from Section 2.2: 
if we are given the operation of addition satisfying axioms (A0)-(A4), and we 
define multiplication by ab = 0 for all a,b, we obtain a ring. (For a more 
challenging question, try to describe all the possible definitions of multiplication 
which would give a ring.) 

Part (b) is much harder. If you tried this question, you probably attempted 
to write down an explicit rule for addition which would make R into a ring. 
I do not know how to do that. Instead, I will give here a solution which is 
non-constructive, and is an illustration of the concept of factorisation, which we 
discussed in Sections 2.10—2.13. 

Let R be a ring in which the set of elements is Z and the multiplication is the 
same as that in Z. We start by making a list of properties of R. Any property 
which is defined purely in terms of multiplication, which holds in Z, will hold in 
R. Thus, we have the following: 


) Ris commutative. 
) R has an identity element 1. 
(c) R has no divisors of zero. (Thus, R is an integral domain.) 

) R has just two units, 1 and z, where z? = 1. (In fact, in Z, we have z = —1, 
but —1 depends on the addition, so we cannot assert that z = —1 here. 
Confusingly, z is the integer whose name is —1, but we do not know that 
it is the additive inverse of 1.) 

(e) R has infinitely many irreducibles (by the theorem of Euclid). 
(f) Ris a unique factorisation domain. 


In fact, these properties determine the multiplication in R completely. For, 
if they hold, then any non-zero element can be uniquely written as up; --- pf", 


where u = 1 or z, pi,p2,... are the irreducibles (one from each associate class), 
and a,,...,@, are positive integers. Now the rule for multiplying these elements 
is clear. 


This means that, if we can find a ring S different from Z having properties 
(a)-(f), then S will have the same multiplication as Z, and so R = S is a solution 
to the problem. 

The simplest example of such a ring is the polynomial ring Fz], where 
F = Zz is the field of integers mod 3. This is a UFD (since it is a Euclidean 
domain); its units are the two non-zero constants; and Euclid’s proof holds virtu- 
ally unchanged to show that there are infinitely many irreducibles. (If there were 
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only finitely many irreducibles, say fi,..., fp, then the polynomial f;--- f, +1 
would not be irreducible, but would not be divisible by any irreducible, a 
contradiction.) 

For example, we might let the irreducible polynomials x, +1, «—1, 27 +1, 


...correspond to the prime numbers 2, 3, 5, 7, .... Then, using @ for the 
new addition, we have 1@1 = -1,365 = -2,461= 7,761 = 15, 
and so on. 


2.19 Ideals in matrix rings. A commutative ring R with identity, whose 
only ideals are the trivial ones (namely, {0} and R), is necessarily a field (see 
Proposition 2.24). This is false if we do not assume commutativity. The ring 
M,(F) of all n x n matrices over the field F' has only the trivial ideals, as we 
shall see; but it is not a division ring for n > 1, since there are non-zero singular 
matrices. 


Theorem 2.31 Let R be a commutative ring with identity, and n a positive 
integer. 


(a) If S is an ideal of R, then M,,(S) is an ideal of My,(R). 
(b) Every ideal of M;,(R) is of this form. 


Proof (a) If S is an ideal of R, then it is a ring, and so M,,(S) is a ring, so (by 
definition) a subring of M,,(R). Now take A = (a;;) € M,(S), and X = (a;;) € 
M,(R). The (2,7) entry of AX is 


n 
) Qiktk;- 
k=1 


Now aiztpj € S, since ay, € S and S is an ideal. Summing over k then gives an 
element of S. So AX € M,,(S). Similarly XA € M,,(S). So M,,(S) is an ideal of 
M,,(R). 

(b) Suppose that T is an ideal of M,,(R). Let S' be the set of elements of R 
which occur as entries in matrices in T. We show that S is an ideal of R and 
that T = M,,(S). 

Let E;; denote the matrix with 1 in row 7 and column j, and 0 in all other 
positions. Also, let S’ be the set of all elements x € R such that xE\, € T. (Here 
xvE\, is the matrix with x in the top left-hand corner and all other entries zero.) 
Step1 S’=S. 

For clearly S’ C S. Let « € S; then there is a matrix A = (a;;) € T such that 
Qpq = «. Now it is easily checked that 


E\pAEq = Apg E14 = cE}. 


Since T is an ideal, r£,, € T, so x € S’. 
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Step 2. S is an ideal of R. 
Take x,y € S andr © R. Then vFy1,yE\, € T. Then, for any r € R, we 
have 
(a + y) Fy = chy + yh E T, 
(ra) Fy, = (rE\1)(@E 1) € T, 
a = (wE11)(rE11) eT, 


since T is an ideal of M,,(R). So e+ y,ra,ar € S, and S is an ideal of R. 
Step 3 T= M,(S). 
By definition, T C M,,(S). Suppose that A = (a;;) € M,(S). Then 


n 


A= S- Ein (Gj £11) Fj ET, 


ij=l 


since aj; £1, € T by Step 1 and T is an ideal. 


Thus, for example, the ring of 2 x 2 matrices over the ring of integers mod 4 
has just three ideals: 


e the zero ideal; 
e the ideal consisting of matrices with every entry 0 or 2; 
e the whole ring. 


A ring R is defined to be simple if the only ideals in R are {0} and R. 
Thus, any field (or, indeed, any division ring) is simple. From Theorem 2.31 we 
immediately conclude: 


Corollary 2.32 Let F be a field, and n a positive integer. Then M,(F) is a 
simple ring. 


Exercise 2.36 Recall the definition of the direct product R x S of two rings R and 
S: the elements of R x S are all ordered pairs (r,s), where r € R and s € S; and the 
operations are componentwise, that is, 


(1,81) + (r2, 82) = (r1 + 2, $1 + 82), (r1, 81) - (v2, $2) = (rire, $182). 


Let R’ = {(r,0) :r € R} and S’ = {(0,s): 5 € S}. Prove that R’ and 9” are ideals of 
Rx S isomorphic to R and S' respectively. 

Now suppose that T is a ring which contains ideals R and S having the property 
that every element of T can be written uniquely in the form r+ s, where r € R and 
s € S. Prove the following assertions: 


a) R+S=T and RNS = {0}. 

b) Ifr € Rand s €S, then r and s commute (that is, rs = sr). 

c) The map @ from R x S to T given by (r,s)@ =r + is an isomorphism. 
) T is isomorphic to R x S. 
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Exercise 2.37 Let R be a ring, and a an element of R. Remember that (a) means the 
ideal generated by a, which by definition is the smallest ideal of R containing a. 


(a) Prove that (a) is the set of all elements of R of the form 


m 
na+sa+at+ oe Sati, 
i=1 


where m and n are integers, m > 0, and s,t, s;,¢; are elements of R (and na has 
its usual meaning). 

(b) Suppose that R has an identity. Show that the terms na, sa, and at can 
be dropped from the expression above. 


(c) The element a is said to be central if it commutes with every element 
of R. Show that, if R has an identity and a is central, then 


(a) =aR={ar:reé R}. 


(d) Give a description of (a) in the case where a is central but R does not 
necessarily have an identity. 


Exercise 2.38 Let F' be a field and n a positive integer. Let R be the ring M,(F) 
of n xX n matrices over F’. Let a = E11 be the matrix with entry 1 in the first row 
and column, and all other entries zero. By Theorem 2.31, we know that (a) = R. So, 
by part (b) of the preceding exercise, every element of R can be written in the form 


yo, siati, for some elements s;,¢; € R. Show that there are elements of R which 


cannot be expressed as the sum of fewer than n terms of the form s;,at;, for s;,t; € R. 


Exercise 2.39 An clement e of a ring is said to be an idempotent if e? = e. 


(a) Let e be an idempotent in a ring with identity. Show that 1 — e is also 
an idempotent. 


(b) Let R and S be rings with identity. Show that the elements (1,0) and 
(0,1) are central idempotents of the direct product R x S, whose sum is the 
identity of Rx S. 

(c) Conversely, suppose that T is a ring with identity and e is a central 
idempotent of T with e 4 0,1. Prove that T ~ Rx S, where R = eT and 
S=(l-e)T. 


Exercise 2.40 An element r of a ring R is said to be nilpotent if r” = 0 for some 
positive integer n. 


(a) Prove that a non-zero nilpotent element is a zero divisor. Is the converse 
true? 


(b) Prove that, in a commutative ring, the set of nilpotent elements is an 
ideal. 
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(c) Let n be a positive integer. Find all nilpotent elements of the ring Z,, of 
integers mod n. (Since, by the previous part they form an ideal, they must consist 
of all multiples of n* for some integer n* dividing n. Your job is to calculate n* 
in terms of n.) 


Exercise 2.41 Let R be a ring with identity. Prove that, if the element r € R is 
nilpotent, then 1+ 7 is a unit. 


Exercise 2.42 Prove that, in a matrix ring M,(R), any strictly upper triangular 
matrix is nilpotent. 


3 Groups 


We now turn to the study of groups. A group is a set with a single binary 
operation. So groups are much less structured than rings. We will see that this 
gives a different flavour to the subject. 


Groups and subgroups 


3.1 Introduction. Groups resemble rings in many ways. The main difference 
is that the conditions defining a group are less stringent (only one operation and 
four axioms, half as many as for rings), so that examples of groups are more 
numerous and varied. But also, there is no ‘canonical example’ of a group, 
corresponding to the ring Z, on which to base our definitions and test our 
intuition. 

First, there is a problem of notation. Many examples of groups are based 
on number systems, as we will see. Sometimes, the operation is addition, and 
the identity element is 0; at other times, the operation is multiplication, and the 
identity is 1. So we define a group using terminology different from both of these, 
not carrying the freight of associations of plus or times. 

A group is a set G with a binary operation o satisfying the following laws: 


(GO) (Closure law): For all g,h € G, goheG. 

(G1) (Associative law): go (hok) =(goh)ok for all g,h,k EG. 

(G2) (Identity law): There exists e € G such that goe = eog = g for all 
g EG. 

(G3) (Inverse law): For all g € G, there exists h € G with goh =hog=e. 


After we defined rings, we gave some extra conditions that select special 
classes of rings. We do the same here, but there is only one such class to be 
defined. We say that a group is abelian, or commutative, if it satisfies: 


(G4) (Commutative law): goh=hog for all g, hE G. 


(The term ‘abelian’ is much more common than ‘commutative’ in this context; it 
commemorates the mathematician N. Abel. In fact, many things are named after 
mathematicians who had some involvement in their discovery; but the ultimate 
accolade is that the word has passed so much into common usage that we use a 
lower-case letter for it.) 

At this point, you should stop and compare the axioms (G0)—(G4) with the 
first five ring axioms (A0)—(A4). You will see that they are exact translations: 
we have put ‘group G’ in place of ‘ring R’, o for +, e for the zero element 0, and 
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used letters g, h, k instead of a, b,c. This brings us naturally to our first examples 
of abelian groups.... 


3.2. Examples of groups. 


Example 1 _ It follows from our observation at the end of the last section that, 
if R is any ring, then (R,+) (meaning the set R with the operation +, where 
we forget entirely about the multiplication), is an abelian group. So every ring 
gives us an abelian group, called the additive group of the ring. 

Conversely, we saw in Section 2.2 that, given any abelian group R, where the 
group operation is written as + and the identity as 0, we obtain a ring (a zero 
ring) by defining the ring multiplication by the rule 


ab = 0 for alla,be R. 


So abelian groups are exactly the same as additive groups of rings. For 
this reason, in the study of abelian groups, it is customary to write the group 
operation as + rather than o, and the identity as 0. 


Example 2: Groups of units Here is another construction of a group from 
a ring, this time using the multiplication. It is not true that the ring with the 
operation of multiplication forms a group. However, we do find a group as follows: 

Let R be a ring with identity. Let U(R) be the set of units of R. (Recall that 
u is a unit if there exists v € R such that wv = vu = 1.) Now (U(R),-) isa 
group. To show this, we have to check the axioms. But most of the work is done 
for us. 

Just after the definition of units in Section 2.10, we proved Proposition 2.14, 
with three parts: 


e The product of two units is a unit. Thus, the units satisfy (GO). 
e The identity is a unit. Thus, the units satisfy (G2). 
e The inverse of a unit is a unit. Thus, the units satisfy (G3). 


This leaves us with just the associative law to check; but the associative law 
holds for units because it holds for all elements of the ring, by (M1). So the 
claim is proved. 

The group U(R) is called the group of units of R. 

Here are a few examples of this construction. 


1. U(Z) = {+1,—-1}. So this set is a group, having just two elements, under 
the operation of multiplication. 

2. If F is a field, then every non-zero element is a unit. Thus, F'\ {0} is a group 
under multiplication. This group is called the multiplicative group of the 
field, often written as F™. 

3. Again let F be a field, and let n be a positive integer. Then the set M,(F) 
of n x n matrices over F' is a ring. A matrix A is a unit if and only if 
it is invertible (or non-singular); that is, det(A) 4 0. Thus, the group 
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U(M,,(F)) of units of F consists of all the invertible n x n matrices. This 
group is referred to as GL(n, F’), and called the general linear group. (It is 
linear because matrices are the topic of Linear Algebra, and general because 
it is the largest possible group we could make out of n x n matrices with the 
operation of matrix multiplication.) 


Example 3: Permutations Let 2 be a set. A permutation of 2 is a one- 
to-one and onto function (a bijection) 7: Q— . 


We write x7 for the image of the element x under the permutation 7, rather 
than 7(x) as you might expect. The reasons are somewhat similar to those that 
we discussed in the last chapter for writing homomorphisms ‘on the right’. 

We define the operation of composition of permutations as follows: if 7, and 
mq are permutations, their composition 7 0 m2 is given by 2(71 0 72) = (a1) 72. 
In other words, apply 7, then 7. 

This shows the reason for writing permutations on the right of their arg- 
uments. If we wrote them on the left, then ‘first 7, then 7’ would be 72(71(2)), 
and we would either have to set (71 0 72)(%) = m2(m1(@)), or else redefine comp- 
osition to mean ‘first 72, then 71’. Life is complicated enough without that! But 
you are hereby warned: some people do exactly that. 

How do we describe a permutation? In the case where 22 is finite, there are 
two commonly used notations, which we met in Chapter 1. Take ( to be the set 
41 Osc nes ahs 

Two-line notation for the permutation 7: We write the elements 1,2,...,n 
in the top row of an array. Below the element x we write its image x7 under the 
permutation. So, for example, 


_f1 2 3 4 5 6 
PENG ged oA 
is the permutation which maps 1 to 2, 2 to 5, 3 to 3, and so on. 
This notation enables us to count the number of permutations. The image of 
1 (the number written under 1) can be any of the n elements of 2. When it is 


chosen, the image of 2 can be any of the remaining n — 1 elements, the image of 
3, any of the remaining n — 2, and so on. So the number of permutations is 


n-(n—1)-(n—2)---2-Ll=nl, 


the product of the numbers from 1 to n (or ‘n factorial’). 

Cycle notation is more compact. Given the permutation 7, choose a point 1; 
open a bracket and write 1, followed by its image under 7, followed by its image, 
and so on, until the next step would return us to 1; then close the bracket. Then 
pick the smallest number not used so far, and repeat the procedure with it; 
continue until all numbers have been used. The sequences in brackets are called 
the cycles of 7. 
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For example, the permutation a described above in two-line notation would 
be written as (1,2,5)(3)(4,6). The calculation goes, ‘1 maps to 2, to 5, back to 
1; then 3 maps to itself; then 4 maps to 6, back to 4’. So 7 has three cycles. 

Note that the point 3, which is fixed by the permutation, lies in a cycle with 
just one element (a cycle of length 1). By convention, we simply omit cycles of 
length 1. So 7 would be written as (1,2,5)(4,6). There is one exception to this 
convention. If we applied it to the identity, then everything would be omitted, 
and we would write nothing at all! Usually we put in a token cycle of length 1 
and write the identity as (1). 

Now let Sym(Q), the symmetric group on 2, be the set of all permutations 
of Q, with the operation of composition. We claim that it is a group. 


(GO) If 7, and m2 are permutations, then so is 7 0 77. 
e Proof that it is one-to-one: suppose that «(71 0 72) = y(m1 © 72). Then 
(a7). = (ym1)T2. Since 72 is one-to-one, we see that x71 = ym; then 
since 71 is one-to-one, x = y. 
e Proof that it is onto: Given x € Q, there exists y € 2 such that yr = a, 
since 7 is onto; then there exists z € 2 such that zz, = y, since 7 is 
onto. Then z(71 0 72) = a. 
(G1) x((m1 0 72) 0 73) = (x(m71 © 772)) 73 = ((271) 72) 73, 
x(m1 0 (120 73)) = (71) (72 0 73) = ((@71) 72) 73. 
In other words, both (7 0 72) 0 73 and 7 © (72 0 73) say ‘apply 71, then 7, 
then 73’. 
(G2) The identity permutation ¢€ defined by xe = x for all x (leaving everything 
where it is) satisfies €0 7 = 7 = 7 0€ for all permutations 7. 
(G3) If 7 is a permutation, it is a one-to-one and onto function, and hence has an 
inverse function 0, where xo = y if and only if yr = x. Then too = oom = €. 


IfQ = {1,2,...,n}, we write the symmetric group Sym(Q) as S,, for brevity. 
Thus, S,, is a group with n! elements. For example, $3 consists of the six elements 
(in cycle notation) (1) (the identity), (1, 2,3), (1,3, 2), (1,2), (2,3), and (1,3). It 
turns out that S3 is the smallest non-abelian group. 


Example 4: Automorphism groups Let R be a ring. An automorphism 
of R is an isomorphism 6 : R — R; in other words, it is a permutation of R 
which happens also to be a homomorphism satisfying 


(c+ y)@=x0+y8, (xy)O = (x6) (y6), 


using the notation of Example 3. 
Let Aut(R) be the set of all automorphisms of R. Then Aut(R) is a group, 
the automorphism group of R. 
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(GO) If 0, and 62 are automorphisms, so is 91 © 2: for 


(x + y)(91 0 62) = ((x + y)A1) 62 
(x0, + y01)02 
= (£61) 02 + (yA1)62 


x(O, ° 02) + y(O, ° 62), 


and similarly for multiplication. 
(G1) As in Example 3. 
(G2) The identity permutation is an automorphism of R. 
(G3) The inverse of an automorphism is an automorphism. 


Remark There is absolutely nothing special about rings here, unlike the 
situation in Examples 1 and 2. If 4 is any class of mathematical objects for 
which we can formulate the notion of homomorphism (or isomorphism), then 
Aut(X) is a group for any X € ¥. 

For example, we will be able to talk about the automorphism group of a 
group, once we have defined group homomorphisms. 


Example 5 Since a group is a set with a binary operation, a finite group can 
be specified by its operation table, or Cayley table, as it is usually called in 
this case, after Arthur Cayley, who pioneered the use of such tables. 

For example, here is the Cayley table of a group with four elements: 


cA 0 BIS 
ge no0nne 
oa Sala 


You may recognise this as the additive group of the field with four elements 
which we constructed in Section 2.16, or of the Boolean ring of subsets of {1,2} 
in Section 2.2. It is an important enough group to have a name: it is the Klein 
group, or the four-group. The German translation of the latter name, Viere- 
gruppe, gives rise to the notation V4 for the group. (The first name commemorates 
the mathematician Felix Klein, not the fact that it is quite a small group.) 


We will see further examples later on. 


3.3. Properties of groups. Before proceeding further, we will change our 
notation for groups. As we saw, abelian groups are the same as additive groups 
of rings, and are usually written with the symbol + for the group operation 
and 0 for the identity. However, most other groups have more in common with 
multiplicative systems. Accordingly, we will use juxtaposition instead of o for 
the group operation. Often, we refer to gh as the product of g and h. We also 
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use | instead of e for the identity element of a group, and g~! for the inverse of 
the element g. (We will see the uniqueness of identity and inverses shortly.) 

Many of these properties will remind you of similar properties for rings. 
Usually the proofs are almost identical; but they will be repeated here. 


1. Products The product g1--- gn, strictly speaking, requires the insertion of 
brackets so that it can be evaluated, and potentially has many values according 
to how the brackets are inserted. However, in Proposition 2.1, we showed that 
the value of the sum of n elements of a ring is independent of the bracketing 
used to work it out. We remarked that the proof uses only the associative law. 
So the same is true for the product of n elements of a group. 


2. Uniqueness of identity The identity element of a group is unique. 
For suppose that e and f are two identities in a group. Then 


e=ef=f. 
We will denote the unique identity element by 1. 


3. Uniqueness of inverses The inverse of any group element is unique. 
For, if h and k are both inverses of g, then 


h=hl =h(gk) =(hg)k =1k =k. 


We will denote the inverse of g by gt. 


Now our notation for a group is consistent with the notation for the group 
of units of a ring introduced in Section 2.10. 


4. Properties of inverses (a) (gh)~! = h~'g7!. [For 


Gini ¢ DS oti gS plgr 99g Sy 


and similarly the other way around.] 

(b) 17-1 =1, clearly. 

(c) (g-!)~! = g. [For the equations gg~! = gg = 1 show that g has the 
properties of the inverse of g~'; and inverses are unique.| 


1 


5. Cancellation Laws 


(C1) (Left Cancellation Law): If gx = gy, then x = y. 
(C2) (Right Cancellation Law): If xg = yg, then x = y. 


For suppose that gx = gy. Then 
e=le=(g'g)t=9 '(9t)= 9 "(9y) =(9"'9)y = ly =¥. 
The proof of the right cancellation law is similar. 


6. Exponents Now we define g” for any element g € G, and any integer n. 
If n > 0, we define g” to be the product of n factors equal to g. (As we noted 
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in Point 1 above, this is well defined.) For n = 0, we set g° = 1, the identity. 
Finally, for n < 0, say n = —m, we define g~™ = (g™)~'. Now we have laws of 
exponents, as in elementary algebra: 


m n m+n 


(a) g™-g"=g 
(boy ges 
(c) If gh = hg, then (gh)” = g”h”. 


Note that the third of these laws does not hold for arbitrary elements g,h. (We 
say that g and h commute if gh = hg. Like regular commuters, if g and h 
commute, then they can move back and forth in any product of gs and hs. Every 
time we find a factor ---hg--- in the product, we can replace it by ---gh---. 
In this way, we can bring the gs in front of the hs. In particular, the product 
(gh)” = ghgh---ghis equal to gg---ghh---h = g"h".) This proves the assertion 
for positive n. The remaining cases are left as an exercise. 


3.4 Subgroups. Let G bea group. A subgroup of G is a subset of G which, 
using the same operation as in G, is itself a group. We write H < G to indicate 
that H is a subgroup of G (as opposed to H C G, which just means that H is 
a subset of G). If H is a subgroup of G, which is not the whole of G, we write 
H<G. 

Let us consider the group axioms for H, a subset of G. 


(GO) Closure requires that the product of any two elements of H is in H. 

(G1) Since the associative law holds for any elements of G, it certainly holds 
for any elements of the subset H. 

(G2) We require that the identity 1 of G should belong to H. 

(G3) We require that the inverse h~! of any element h € H should also belong 
to H. 


So the associative law comes free, and we only have to check the other three 
axioms. If H is non-empty, we can dispense with the identity. For assume the 
closure and inverse laws, and take any h € H. Then h~! € H by the inverse law, 
and so hh~! = 1 € H by the closure law. Our conclusion is as follows: 


Theorem 3.1 (First Subgroup Test) Let H be a non-empty subset of the 
group G. Then H is a subgroup of G if and only if 


(a) for all hy, ho € H, we have hyh2 € H; 
(b) for allh € H we have h7! € H. 


Just as for rings, we can replace these two tests by a single one: 


Theorem 3.2 (Second Subgroup Test) Let H be a non-empty subset of 
a group G. Then H is a subgroup if and only if, for all hi,he € H, we have 
hha’ € H. 
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Proof Clearly the condition holds if H is a subgroup. So assume that it does 
hold. Take any h € H. Then hh~! = 1 € H; and so 1h~'=h7! € H. So we have 
condition (b) of the First Subgroup Test. Now take hi, hz € H. Then hz! € H, 
and so hi(hy')~! = hihz € H. So condition (a) holds too. 


Remark If the group operation is written as +, then hy hy ' would be written 
as h, — ho. So this condition is the exact counterpart of the similar one in the 
Second Subring Test. We see this in the following example: 


Example Let G be the additive group of the ring Z. Find all subgroups of G. 

We already found that any subring, and any ideal, is of the form nZ (all 
multiples of n) for some integer n. It turns out that the subgroups are exactly 
the same. 

First, we use the Second Subgroup Test to show that nZ is a subgroup. 
Clearly it is non-empty. Take two elements of nZ, say nz and ny. Their difference 
is nx — ny = n(x — y) € nZ. So nZ passes the test. 

Now let AH be an arbitrary subgroup. We follow the strategy we used for 
subrings. If H consists just of 0, then H = 0Z; so suppose not. The inverse 
of a negative number is positive, so H must contain positive numbers; let n 
be the smallest positive number in H. Then any positive multiple of n can be 
obtained by adding n the appropriate number of times, and so is in H. Then 
n(—x) = —nza for positive x, so negative multiples are in H; and clearly 0 € H. 
So H contains nZ. 

Conversely, take any number m € H; we wish to show that n divides m. 
Divide m by n; that is, write m = nq+r, where 0 < r < n. By subtracting n 
q times from m (or adding it —q times, if g is negative), we see that r € H. Now 
n is the smallest positive number in H, and r < n; so necessarily r = 0, and n 
divides m, as required. So H = nZ. 


Exercise 3.1 Which of the following structures (G,o) are groups? 


(a) G=P(X), Ao B = AAB (symmetric difference); 
(b) G=P(X), AOoB=AUB; 
(c) G=P(X), AoB=A\ B (difference); 
(d) G=R,xroy=zxy; 
(e) G is the set of positive real numbers, x0 y = xy; 
(f) G@={2eC: |e] =1}, roy =ay: 
(g) G is the interval (—c,c), 
et+y 
ks aa xy/c? 


[this example describes the addition of velocities in Special Relativity]; 

(h) G = {a,b}, aoa=acob=a, boa=bob=b; 

(i) G= {a,b}, aob=boa=a,aoa=bob=b. 
[In (a)-(c), P(X) is the set of all subsets of X, where X is a set with at least two 
elements.] 
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Exercise 3.2 (a) Show that the symmetric group $3 is non-abelian, by finding two 
of its elements which do not commute. 
(b) Show that S;, is non-abelian for any n > 3. 


Exercise 3.3 Show that the following set of six matrices is a group: 


1 0 0 al -1 -l 1 0 -1 -l 0 1 
0 1/’?\-1 -1/°\ 1 0/’\-1 -1/’\ 0 1/’\1l O/; f° 
Is it an abelian group? 


Exercise 3.4 If A is a subset of a group G, we let A~! = {a~! : a € A}. Also, for 
A,B CG, we let AB = {ab: a € A,b € B}. Prove that A is a subgroup of the group 
G if and only if AA~' C A. 


Exercise 3.5 Let U(R) be the group of units of the ring R = {a+ bV2: a,b € Z}. 
Is U(R) finite or infinite? 


Exercise 3.6 Let R be a commutative ring with identity element 1. Let S be the set 
of all solutions of z? = 1 in R. Show that S, with the operation of multiplication, is an 
abelian group. 


Exercise 3.7 (a) Show that (gh)? = g*h? if and only if gh = hg. 

(b) Show that (gh)~' = g-'h7' if and only if gh = hg. 

*(c) Show that, if there exists a number m such that the equation (gh)” = g”h” 
holds forn =m,n=m-+1 andn=m+ 2, then gh = hg. 

[Since (gh)° = 1 = g°h® and (gh) = gh = g'h', we see that part (c) is ‘best 
possible’—the equation holding for two consecutive values does not suffice to make g 


and h commute—and also that (a) and (b) are special cases of (c), taking m = 0, 
m = —1 respectively.] 


Exercise 3.8 Let R be a ring. Show that Aut(R) is a subgroup of Sym(R). 


Exercise 3.9 (x) Let G be a set with a binary operation o. (As usual, this presupposes 
that the closure law (GO) holds.) Suppose that in addition the following three axioms 
hold: 


(a) the associative law (G1), that is, (goh)ok=go(hok) for all g,h,k € G; 
(b) there exists e € G such that eo g = g for all g EG; 
(c) for any g € G, there exists h € G such that ho g =e (where e is as in (b)). 


Prove that G is a group. 


Exercise 3.10 (x) Let G be a set with a binary operation o. Suppose that g satisfies 
conditions (a) and (b) of Question 2, and also the following: 


(c’) for any g € G, there exists h € G such that go h =e (where e is as in (b)). 


Show that G need not be a group. [Hint: Take the operation defined by go h = h for 
all g,h EG] 


Exercise 3.11 Prove the laws of exponents in a group (point 6 of Section 3.3). 
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Exercise 3.12 («x) A group which contains elements a, b,c,d,e (none of them equal 
to the identity) such that 


ab=c, bec=d, cd=e, de=a, ea=b. 


Find the orders of the elements a, b,c, d, e. 


Subgroups and cosets 


3.5 Cosets. Given a subgroup of a group, we can partition the group into 
cosets, much as we did for subrings of a ring. But there is a complication: because 
of the non-commutativity, a subgroup has two different kinds of cosets, left cosets 
and right cosets. 

Let H be a subgroup of a group. We define two relations ~, and ~R on G, 
as follows: 


e 91 ~L g2 if and only if g; ge € Hi; 
° 91 ~R 92 if and only if gog;' € H. 


Each of these is an equivalence relation. Here is the proof for ~z; try ~R for 
yourself. 


(Eql) CCH =1¢€4H,s0 9, ~1 91; ~1 is reflexive. 
(Eq2) If gy g2 € H, then 


(91 “g2) = 92 Gi 2) = 92 “on é€ A, 


SO 91 ~L g2 implies g2 ~1 gi; ~x is symmetric. 
(Eq3) If gy 'g2 € H and gz ‘g3 € H, then 


(9; '92)(92 '93) = 91 (9292 ')93 = 97 ‘9s € H, 
SO 91 ~x gg and go ~z g3 imply gi ~1 93; ~L is transitive. 


The equivalence classes of ~; are called left cosets, while those of ~p are 
called right cosets. We now give a more usable description of these cosets. 


Proposition 3.3 Any left coset of the subgroup H of G has the form gH = 
{gu : a € H}, while any right coset has the form Hg = {xg: x € H}. 


Proof We prove this for right cosets; the argument for left cosets is very similar. 
Let X be the equivalence class of ~z containing the element g € G, so that 


X={yeG:gr~ry}={yeG:yg"* € H}. 


If y € X, then yg"! = x € H, so y= 2g € Hg; and conversely. 
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For example, let G = $3, the symmetric group on the set {1, 2,3}, and let H 
be the subgroup Sz = {(1),(1,2)}. The left cosets of H are 


H= 11); (1,2)}, 
(1,2,3)H = {(1, 2,3), (2,3)}, 
(1,3,2)H = {(1,3, 2), (1, 3)}; 


while the right cosets are 


A = {(1),(1,2)}, 
A(1,2,3) = {(1, 2,3), (1, 3)}, 
A(1,3,2) = {(1, 3,2), (2, 3)}. 
[In more detail: The left coset (1,2,3)H consists of the two elements 


(1, 2,3)(1) and (1, 2,3)(1, 2). the first is (1, 2,3), since (1) is the identity. To work 
out the second, remember that we compose permutations from left to right. So 
the composite maps 1 to 2, back to 1; 2 to 3 which is then fixed; and 3 to 1 to 
2. The result is the permutation (2, 3).] 

Note that the left and right cosets are not the same in this case. However, 
there are equally many cosets of each type. This is not an accident. 


Theorem 3.4 Let H be a subgroup of the group G. Then there is a biyection 
between the left cosets and the right cosets of H in G; so there are equally many 
of each. 


Proof For any set X of elements of G, we put X~! = {a7!: a € X}. Then 
(X—!)-! = X. We show that, if X is a right coset, then X~! is a left coset, and 
vice versa. So the correspondence X ++ X~! is the required bijection. 

First note that H~! = H. For H contains the inverse of each of its elements, 
so H~! C H. Now, taking the inverse of both sides, we find that H = (H~+)~! C 
H~". So equality holds. 

Now let X = Hg = {hg:h € H} be aright coset. Then 


ato th ee etag A Sag Hh 


is a left coset. The reverse implication is similar. 


In the example, we see that the inverses of the first, second, and third left 
cosets are the first, third, and second right cosets, respectively. 


3.6 Orders; Lagrange’s Theorem. The order of a group G, written |G|, 
is the cardinality of the set G, the number of elements in the group. This may 
be finite or infinite. Thus, the order of the symmetric group S,, is n!, while the 
order of the additive group of Z is infinite. 

If H is a subgroup of G, the index of H in G, written |G : H], is the number 
of right cosets of H in G. (By Theorem 3.4, we could have used left cosets, and 
the answer would be the same.) 
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Theorem 3.5 (Lagrange’s Theorem) Let H be a subgroup of the finite group 
G. Then 


IG| = |H| |G: HI. 
In particular, the order of H divides that of G. 


Proof We know that G can be written as the disjoint union of the right cosets 
of H. The number of cosets is |G : H|. The proof is finished if we can show that 
the number of elements in each coset is equal to |H]. 

Define a function f : H — Hg by f(h) = hg. By our characterisation of 
right cosets, f is onto (this says that every element of Hg is of the form hg for 
h € H). Now we show that f is one-to-one. Suppose that f(h,) = f(h2). Then 
hig = hag. By the right cancellation law, hi = hg. So f is indeed one-to-one, 
and is a bijection. Thus |Hg| = |H], and the proof is complete. 


Remark This gives another proof of Theorem 3.4 in the case of a finite group. 
For exactly the same argument shows that each left coset has |H| elements, so 
the numbers of left and right cosets are both equal to |G|/|H]. 


We now define the order of an element of a group. This is quite a different 
concept from the order of the group; but we will see that there is a connection. 

Let g be an element of a group G. If there exists a positive integer n such 
that g” = 1, then the least such positive n is called the order of g. If no such n 
exists, then we say that g has infinite order. 


Theorem 3.6 (a) Let g be an element of the group G. Then the set 
{g”™:meZ} 


is a subgroup of G; its order is equal to the order of g. 
(b) The order of any element of a finite group G divides the order of G. 
(c) If g has finite order n, then g™ = 1 if and only if n divides m. 


Proof Let H be the set {g™:m € Z}. Take two elements of H, say g? and g!. 
Then g?(g7)~! = g?-4 € H. By the Second Subgroup Test, H is a subgroup. 

If g has infinite order, then all of the powers g™ are distinct, since g? = g% 
for p > q implies g?~4 = 1. So H is infinite. 

Suppose, on the other hand, that g has order n. If m = nk, then g” = 
(g”)* = 1. Conversely, suppose that g” = 1. Write m = nq+r with 0 <r <n. 
Then g’ = g™ "4 = 1. We cannot have r > 0, since n is the smallest positive 
integer such that g” = 1. So r = 0 and n divides m. This proves (c). Now 
(a) follows, since the argument shows that any power of g is equal to one of 
9° =1,9' =9,9?,...,g"1. 

Now (b) follows by applying Lagrange’s Theorem to the subgroup H. 


Definition We use the notation (g) for the subgroup 
{g”:ne€Z} 
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used in the preceding proof, and call it the subgroup generated by g. [Note: 
This is not the same as the ideal (a) generated by the element a of a ring. If we 
took all multiples of g by elements of G, we would obtain the whole group!] 


3.7 Cyclic groups. A cyclic group is a group generated by a single 
element; that is, a group consisting of all the powers of one of its elements. 


Example 1 The additive group of Z is a cyclic group of infinite order, 
while the additive group of Z,, is a cyclic group of finite order n for any positive 
integer n. 

For any positive integer m can be written as 1+1+---+ 1 (m terms); this 
is the mth power of 1 (but written in additive notation!) The negative integers 
are the inverses of the positive ones, and so are negative powers of 1; zero is the 
zero-th power of 1. 


Example 2 The set of complex numbers which are nth roots of unity forms a 
group with the operation of multiplication; this group is cyclic of order n. 

For the nth roots of 1 are the complex numbers e27'*/” for k = 0,1,...,n—1; 
they are all powers of e?71/”. 


There is only one type of cyclic group of each possible order. (When we have 
formulated the notion of isomorphism for groups, we will see that any two cyclic 
groups of the same order are isomorphic.) We will denote the cyclic group of 
order n by C,, (including the possibility n = oo). 

Not every group is cyclic. In the first place, cyclic groups are necessarily 
abelian: for gg” = g™t” = g"g™ for all m,n. And not all abelian groups are 
cyclic. The Klein group V4 (see Example 5 in Section 3.2) is abelian but not 
cyclic. Indeed, we can recognise cyclic groups as follows: 


Proposition 3.7 A finite group G of order n is cyclic if and only if it contains 
an element of order n. 


Proof If g has order n, then the elements g° = 1,g' = g,...,g" + of (g) are 
all distinct; since there are n of them, they comprise all of G. The converse is 
clear. 


The Klein group has order 4, but all its elements except the identity have 
order 2. So it is not cyclic. 

We can describe completely the subgroups of cyclic groups. For the infinite 
cyclic group (the additive group of Z), we already did this in Section 3.4. For 
finite cyclic groups, the following result holds: 


Theorem 3.8 Let G = (g) be a cyclic group of order n. Then, for each divisor 
m of n, there is a unique subgroup of G of order m, which is a cyclic group 
generated by g"/™; and these are all the subgroups of G. 

Proof Let H be a subgroup of G, and let k be the smallest positive integer 
such that g* € H. We claim that g' € H if and only if k divides |. The proof 
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is along now familiar lines. If | = kq, then g! = (g")4 € H. Conversely, suppose 
g' €H, and let l=kq+r with 0 <r<k. Then g” = g'-*4 € H, and so r= 0. 
In particular, g” = 1 € H, so k divides n. Putting m = n/k, we see that H is 


generated by g”/™, and that H has m elements g° = 1, g*,g?",...,g6"—D¥. 


Exercise 3.13 Let H be a subgroup of a group G. Show that any left coset of H is 
equal to a right coset of some subgroup (not necessarily H). 


Exercise 3.14 (a) Let g be an element of a group G. Let J = {n € Z: g” = 1}. Prove 
that I is an ideal of Z. Use this to show that (a) and (c) of Theorem 3.6 hold. 
(b) Apply a similar idea to the proof of Theorem 3.8. 


Exercise 3.15 An involution in a group G is an element g having order 2; that is, 
such that g? = 1 but g £1. 


(a) Show that, if G is a group of odd order, then G contains no involutions. 

(b) Show that, if G is a group of even order, then G contains at least one involution. 
[Hint: Pair up the elements of G with their inverses. The only elements which are 
unpaired (because they are equal to their own inverses) are the identity and the 
involutions.] 


Exercise 3.16 («) This exercise generalises part (b) of the preceding one. Prove the 
following: 


Theorem 3.9 (Cauchy’s Theorem) Let G be a finite group, and p a prime number 
which divides the order of G. Then G contains an element of order p. 


Hint: Let 


Q= {(91,92,-+-39p) : 9192°** Gp = As}; 


a subset of the Cartesian power G?. Let 7 be the following permutation of (: 


(91; 92)--++9p)™ = (G2,-+-59p,91)- 


In other words, a shifts every coordinate back one place and moves the first coordi- 
nate to the end. Show that 7a really is a permutation of Q, in other words, that if 


(91, 92, ae 9p) € Q, then also (ge, Sake 9p, 91) € Q. 
Now decompose 2 into cycles of the permutation 7. Show that 


(a) if g? = 1, then (g,g,...,g) € Q and this element is fixed by 7; 
(b) all other elements of 2 belong to cycles of size p. 


Show that |Q| = |G|?~*. It follows that |Q| is divisible by p. Since all cycles have size 
1 or p, the number of fixed points is also divisible by p. But (1,1,...,1) is a fixed point; 
so there are at least p — 1 fixed points of the form (g,g,...,g) where g has order p. 


Exercise 3.17 («) In this question we use Lagrange’s Theorem to prove Fermat’s 
Little Theorem. 

Let m be a positive integer. Define ¢(m) to be the number of integers x satisfying 
0<a<m-1and gced(z,m) = 1. The function ¢ is called Euler’s totient function. 
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(a) Show that the order of the group of units of Zm is @(m). 
(b) Deduce that, if ged(2,m) = 1, then 2°(™ =, 1. 
(c) If p is a prime number, show that ¢(p) = p—1. Deduce that, if p does not divide 


x, then 2?~' =, 1. 


(d) Hence show that, if p is a prime number, then x” =, =< for all integers x. 


Homomorphisms and normal subgroups 


3.8 Definitions. The definition of a group homomorphism is almost iden- 
tical to that of a ring homomorphism; the only difference is the simplification 
resulting from the fact that there is only one operation. As for rings, we write 
group homomorphisms ‘on the right’. 

Let G and H be groups. A homomorphism @ : G — H is a function @ from 
G to H that satisfies the condition 


(9192)0 = (918) (928) 


for all gi, g2 € G. 
It follows from the definition that, if 6 is a homomorphism, then 


1g@ = 1, where 1g and 1¥ are the identity elements of G and H respectively; 
g 10 = (g@)~} for allg EG. 


A homomorphism that is one-to-one and onto is called an isomorphism. If 
there is an isomorphism from G to H, then we say that G and H are isomorphic. 
As is the case for rings, if two groups are isomorphic, then from the point of view of 
abstract algebra they are the same, even if their elements are completely different. 

We have to deal with some unfinished business. 


Theorem 3.10 Two cyclic groups of the same order are isomorphic. 


Proof Let G = (g) and H = (h) be cyclic groups of the same order; that is, 
either both are infinite, or both have order n for some positive integer n. 

We define a function @: G — H by the rule g”@ = h™ for all m € Z. If 
G has infinite order, then the powers of g are all distinct, and 0 is well defined. 
This is also true in the finite case. For suppose that g* = g'. Then g*~! = 1, so 
n divides k — 1. But then we have h*~! = 1, so h* = h’. 

Now @ is trivially a homomorphism, since 


(g*g')0 g**'9 perl hep! (g*0)(g'0). 


It is clear that 6 is onto. Finally, @ is obviously one-to-one if G is infinite; while, 
if G has order n, then 


G6 = 60> 96 =13h*$=13sn|(kK-Dogit=1> 9g =¢). 


Thus @ is an isomorphism. 


As mentioned earlier, we denote the cyclic group of order n by Cp. 
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We saw that a ring homomorphism blurs the structure of a ring; for groups, 
the situation is very similar. 


Example We will find all homomorphisms from the group Z to itself. Let 6 
be a homomorphism. Suppose that 10 = n. Then 20 = (14+ 1) =n+7n = 2n, 
and similarly (by induction), m@ = mn for all positive integers m. Moreover, 
06 = 0, and for positive m we have (—m)@ = —m@ = —man. So @ multiplies every 
integer by n. 

So far, this is identical with the situation for rings. But now there is no 
multiplicative structure to restrict things further: for any m, the function 0, 
that multiplies everything by m is a group homomorphism. 


A homomorphism 6: G — H is a function, and so has an image and a kernel 
in the sense of Section 1.15. As for rings, we simplify the definition of the kernel. 


Definition Let 6: G— H bea homomorphism of groups. The image of 6 is 
Im(0) = {h € H: h= g@ for some g € G}, 
and the kernel of @ is 
Ker(6) ={g €G: gO=1}. 
Proposition 3.11 Let @:G-— H be a group homomorphism. Then: 


(a) Im(@) is a subgroup of H. 

(b) Ker(@) is a subgroup of G which has the additional property that, for any 
x € Ker(@) and g € G, we have g~!xg € Ker(6). 

(c) Two elements of G are mapped to the same element of H under 6 if and 
only if they lie in the same right coset of Ker(@). 


Proof (a) We apply the subgroup test. Take hi,h2 € Im(@). Then hy = gif 
and hy = g20, for some gi, g2 € G. Then 


hihy* = (918)(920)~* = (9193 ')0 € Im(6); 


so Im(6) is a subgroup of H. 
(b) Similarly, take gi, go € Ker(@). Then 919 = g20 = 1; so 


(9192 ')0 = (919)(929)* = 1; 


so Ker(@) is a subgroup. 
Now we check the extra condition. Suppose that « € Ker(@) and g € G. Then 


(g-*xg)0 = (g0)*-1- (99) =1, 


and so g- tag € Ker(6). 
(c) Suppose that 919 = go0. Then (g2g,;')@ = 1, so x = gog;' € Ker(6); 
then gz = xg € Ker(@)gi, so gi and gg lie in the same right coset of Ker(@). 
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Conversely, if g, and go lie in the same coset, say gz = gi with x € Ker(6); then 
g29 = (x0)(g19) = 91, since x € Ker(@). 


As in the case of rings, the extra property of the subgroup Ker(6) is so 
important that it is given a special name. A normal subgroup of a group G is 
a subgroup H of G such that, for any x € H and g € G, we have g-!xg € H. 
We write H <I G to indicate that H is a normal subgroup of G. If H dG and 
FH is not equal to G, we write H«G. 

So we can say more briefly: The kernel of a homomorphism # : G — H is 
a normal subgroup of G. Normal subgroups play much the same role in group 
theory that ideals do in ring theory. 

There are several equivalent definitions of a normal subgroup. 


Theorem 3.12 Let H be a subgroup of a group G. Then the following are 
equivalent: 


(a) for allg € G, x € H, we have g-‘ag € H; 
(b) for allg € G, we have g-'Hg = H; 
(c) for allg € G, we have Hg = gH. 


Proof Since, by definition, g~'Hg is the set of all elements g~ ‘xg for x € H, 
we see that condition (a) can be rewritten as g-'Hg C H. So (a) is implied by 
(b). Conversely, suppose that (a) holds, so that g~'Hg C H for all g. Replacing 
g by g-1, we see that gHg~! C H. Multiply this equation on the left by g~! and 
on the right by g, to obtain H C g~!Hg. So equality holds, and we have (b). 
We get from (b) to (c) by multiplying on the left by g, and back again by 
multiplying by g~!. 


Part (c) of the theorem says that a subgroup is normal if and only if its 
left and right cosets are the same. So our earlier example (with G = Ss and 
H = Sg) of a subgroup with different left and right cosets is also an example of 
a non-normal subgroup. 

Here are some simple tests which guarantee that a subgroup is normal. 


Proposition 3.13 Let H be a subgroup of a group G. Each of the following 
conditions implies that H is a normal subgroup: 


(a) G is abelian; 
(b) H is finite and is the only subgroup of G of its order; 
(c) H has index 2 in G. 


Proof We illustrate the three parts of the preceding theorem. 


(a) If G is abelian, then g~txg = x for all x,g € G. So test (a) applies. 

(b) Suppose that H is the only subgroup of G of order m, for some finite 
m. It is not hard to show that g~!Hq is a subgroup of G, also of order m. So 
g ‘Hg =H for any g € G, and test (b) applies. 
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(c) The statement that H has index 2 in G implies that it has just two right 
cosets, one of which is H, so the other coset must be all the rest, namely G \ H. 
In the same way, H has just two left cosets, H and G \ H. Hence the left and 
right cosets coincide, and test (c) applies. 


3.9 Factor groups and isomorphism theorems. Let N be a normal sub- 
group of a group G. We are going to define a ‘factor group’ G/N whose elements 
are the cosets of N in G (left or right, it’s the same, since N is normal). This works 
in much the same way as for factor rings. We then prove the exact analogues of 
the isomorphism theorems for rings. Since the proofs are virtually identical to 
the earlier ones, the discussion will be briefer. 


Definition Let N be a normal subgroup in the group G. The factor group, 
or quotient group, G/N is the set of (left or right) cosets of N in G, with 
operation defined by 


(Ngi)(Ng2) = Ngige. 
Theorem 3.14 The factor group, as defined above, is indeed a group. 


Proof First we have to check that the definition is a good one: that is, if gi and 
g5 represent the same cosets as g; and gp respectively, then gigi represents the 
same coset as gig2. So suppose that Ng, = Ng and Ngo = Ng}. Say gi = 791 
and g5 = yg2, where x,y € N. Then 


G92 = LNYG2 = L291 92, 


where the last equality holds because giy € giN = Nqi, and so gy is equal to 
zg, for some z € N. But then Nagi gh = Ngigo, as required. 

Now the rest of the proof involves verifying the axioms, which is routine. The 
closure law needs no proof, since the operation is well defined. For the associative 
law (G1), we have 


((Ng1)(Ng2))(Ngo3) = (Ngi9g2)N 93 = N(g9192)93, 


(Ng1)((Ng2)(N93)) = Ngi(N 9293) = Noi (9293), 


and the right-hand sides are equal, by the associative law for G. The identity of 
G/N is N1=N, and the inverse of Ng is Ng7?. 


The factor group comes as the image of a natural homomorphism, which (as 
in the case of rings) is called the canonical homomorphism. Remember that 
the elements of G/N are the cosets of N in G. Now define a map 0: G — G/N 
by the rule that g@ = Ng for all g € G. Checking that 6 is a homomorphism is 
straightforward from the definition of the operation in G/N, 


(919)(920) = (Ngi)(Ng2) = Noige = (9192)0- 
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The image of 6 is G/N, since every coset has the form Ng for some g € G. What 
is the kernel of 0? Since the identity element of G/N is the coset N, we have 


Ker(0) ={gEG:Ng=N}={9gE€G:gE NS=EaN. 
Hence we have proved the following: 


Theorem 3.15 The canonical homomorphism 6: G— G/N defined by g@ = 
Ng for g € G is indeed a homomorphism; its image is G/N and its kernel is N. 


Now we return to our analysis of the image and kernel of an arbitrary 
homomorphism. 


Theorem 3.16 (First Isomorphism Theorem) Let 0: G— H be a group 
homomorphism. Then: 


(a) Im(6) is a subgroup of H; 
(b) Ker(@) is a normal subgroup of G; 
(c) G/ Ker(0) = Im(6). 


Proof We have already shown (a) and (b). For (c), there is only one reasonable 
definition of a map ¢ from G/N to H, where N = Ker(6): we must put (Ng)¢ = 
g@ for all g € N. As in the ring case, we can show that ¢ is well defined, that it 
is a homomorphism, that it is onto Im(6), and that it is one-to-one. 


The second and third ‘Isomorphism Theorems’ relating a group G to a factor 
group G/N also work as in rings. 


Theorem 3.17 (Second Isomorphism Theorem) Let N be a normal sub- 
group of G. There is a one-to-one correspondence between the set of subgroups of 
G which contain N and the set of subgroups of G/N. Under this correspondence, 
normal subgroups of G containing N correspond to normal subgroups of G/N. 


Theorem 3.18 (Third Isomorphism Theorem) Let N be a normal 
subgroup of G and H a subgroup of G. Then: 


(a) NH ={nh:n€N,h€ H} is a subgroup of G containing N; 
(b) NOFA is a normal subgroup of H; 
(c) H/(NO HA) &(NA)/N. 


3.10 Conjugacy. There is another equivalence relation defined on a group, 
which is very important. 

Let G be a group. We say that elements x,y of G are conjugate (written 
x~y) if y=g-'xg for some g € G. 
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Conjugacy is an equivalence relation, since it is 


(Eq1) reflexive: x = 111 for any x € G. 
(Eq2) symmetric: if y= g~!ag, then x = (g*) "yg 
(Eq3) transitive: if y= g~tag and z = h~'yh, then z = (gh)~!a(gh). 


=1 


Hence, G is the disjoint union of the equivalence classes, which are called 
conjugacy classes. 
Conjugacy classes are closely related to normal subgroups: 


Proposition 3.19 The subgroup H of the group G is normal if and only if H 
is the union of some (possibly all) of the conjugacy classes of G. 


Proof One of the equivalent conditions for normality of H is that g-'Hg = H 
for all g € G. But this says that g~txg € H for all x € A and all g € G; that is, 
for every element of H, its entire conjugacy class is contained in H. 


Another important property of conjugacy is the following. 
Proposition 3.20 Conjugate elements of a group have the same order. 


Proof 


Qo ag So 19g ag eg eg So aa, 


so (g-!ag)” =1 if and only if «” = 


We can calculate the size of a conjugacy class in terms of another subgroup 
of G. The centraliser of the element x € G, written Ce(x), is the set of all 
elements of G which commute with z: 


Ce(a) = {9g € G: gx = zg}. 


Theorem 3.21 (a) For any element « € G, Cg(a) is a subgroup of G. 


(b) There is a bijection between the conjugacy class of an element x of G and 
the set of cosets of Ce(x) inG. 


(c) (the class equation) 
1 
es i 
» Gata 


where the elements x; are representatives of the conjugacy classes. 
Proof (a) If g and h commute with x, so do gh and g™. 

(b) The bijection takes the conjugate g~txg to the coset Cg(x)g. To show 
that it is a bijection, we must show that g~!xg = h~'wh if and only if Ce(x)g = 
Ca(x)h. In fact, both are equivalent to the assertion that gh~! € Ca(z). 

(c) It follows from (b) and Lagrange’s Theorem 3.5 that the size of the con- 
jugacy class containing x is equal to |G|/|C¢(x)|. Now the sum, over a set of 
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conjugacy class representatives, of the class sizes, is clearly equal to |G|. Dividing 
this equation by |G| gives the result. 


The centre of a group G, written Z(G), is the set of elements of G which 
commute with every element of G: 


Z(G) = {x €G: xg = ge for all g € G}. 
(The letter Z stands for the German Zentrum, ‘centre’.) 
Proposition 3.22 The centre of a group G is a normal subgroup. 


To show that it is a subgroup, apply the subgroup test: if « and y commute 
with everything in G, then so does zy~!. Now an element x belongs to Z(G) if 
and only if x is conjugate only to itself; that is, its conjugacy class is {2}. Thus, 
Z(G) is a union of conjugacy classes, and hence a normal subgroup. 

We can use these ideas to show that, if the order of G is a prime power, then 
necessarily G has a non-trivial normal subgroup. 


Theorem 3.23 Let G have order p", where p is prime and n > 0. Then 


Z(G) # {1}. 


Proof The sum of the conjugacy class sizes in G is p”. But each class size is 
a divisor of p”, say p™, fori = 1,...,m. Suppose that k of these class sizes are 
equal to p® = 1, so that |Z(G)| = k. All of the others are powers of p which are 
at least p, and hence are divisible by p. So we obtain k + lp = p”. It follows that 
k is divisible by p. Thus Z(G) is a subgroup of G whose order is divisible by p, 
hence is not 1. 


There are other applications of these ideas. For example, we say that two 
subgroups H and K of the group G are conjugate if K = g~!Hg for some 
g € G. Again, this is an equivalence relation on subgroups. [Check first that, if 
H is a subgroup, then so is g~!Hg.] Now a subgroup is normal if and only if it 
is conjugate only to itself. Moreover, conjugate subgroups have the same order. 
So, if H is the only subgroup of G of its order, then it is necessarily normal. 


Exercise 3.18 Let F be a field, and let G be the set 


He ) va,be Faz} 


of 2 x 2 matrices over F’. Let N be the set of all matrices in G with a = 1, and H the 
set of all matrices in G with b = 0. 


(a) Prove that G is a group. 
(b) Prove that N is a normal subgroup of G, which is isomorphic to the additive 
group of F. 
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(c) Prove that H is a subgroup of G, which is isomorphic to the multiplicative group 
of F’. Is it normal? 
(d) Prove that G/N = H. 


Exercise 3.19 («) Prove that the group of the preceding exercise, in the case where 
F = Zs, is isomorphic to the symmetric group 53. 


Exercise 3.20 Let G be the set of all ordered pairs (a,b), where a and 0 are real 
numbers and a # 0. Define an operation o on G by the rule that 


(a1, b1) ° (a2, b2) = (a1d2, biaz ss a be). 


Prove carefully that G with this operation is a group. 

Now show that G is isomorphic to the group of all permutations of the real numbers 
of the form x + ax + b, where a £0. 

Could you use this information to make the argument for the first part of the 
question easier? 


Exercise 3.21 Let G be a group with the property that G/Z(G) is cyclic. If G/Z(G) 
is generated by the coset Z(G)g, show that every element of G can be written in the 


form zg’ for some z € Z(G) and some i € Z. Deduce that G is abelian (so that in fact, 
Z(G) =G). 


Exercise 3.22 (x) (a) Let G be a group. For any element g € G, let 1g be the function 
from G to G defined by tg = g~'xg. Show that vg is an automorphism of G. [It is 
called the inner automorphism induced by the element g.] 

(b) Show that the set {vg : g € G} is a subgroup of Aut(G). [This subgroup is called 
the inner automorphism group of G, denoted Inn(G).] 

(c) Show that the map g + Jt, is a homomorphism from G to Aut(G), whose image 
is the inner automorphism group Inn(G) and whose kernel is the centre Z(G). Deduce 
that Inn(G) = G/Z(G). 

(d) Show that Inn(G) is a normal subgroup of Aut(G). [By definition, the factor 
group Aut(G)/Inn(G) is the outer automorphism group of G, denoted Out(G).| 


Exercise 3.23 Find all subgroups of the symmetric group $3. Which of them are 
normal subgroups? [S3 is the group of all permutations of {1, 2, 3}.] 


Exercise 3.24 Show that the group of all real numbers (with the operation of addi- 
tion) is isomorphic to the group of positive real numbers (with the operation of 
multiplication). 


Exercise 3.25 Let N be a normal subgroup of a group G, and let H be any subgroup. 
Show that NH is a subgroup of G. [Recall that NH is the set {nh: n€ N,he€ H}.] 
Which (if any) of the following statements are true? 


(a) If H is a normal subgroup, then NH is a normal subgroup. 
(b) If NA is a normal subgroup, then H is a normal subgroup. 


Exercise 3.26 Let Gi be the group of integers (with the operation of addition), and 
G2 = Z,, the group of complex nth roots of unity (with the operation of multiplication). 
Define a function @ : G: — G2 by the rule 


0(k) = e2tik/n 
for k € G4. 
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(a) Prove that 6 is a homomorphism. What are its image and kernel? 
(b) Show that the cosets of the kernel of @ have the form 


{x:xv€Z,x =n k} 


for k = 0,1,...,n —1. Hence show that the additive group of integers mod n is 
isomorphic to Zp. 


Exercise 3.27 (a) Prove that C2 is the only finite group which has just two conjugacy 
classes. [Hint: Use the class equation.] 

(b) Find the conjugacy classes in 53. [There are three of them.] 

(c) Prove that no finite group of order greater than 6 can have three conjugacy 
classes. 

*“(d) Show that there is a function f such that a finite group with r conjugacy 
classes has order at most f(r). 


Exercise 3.28 Our first example in this chapter was a construction of an abelian group 
from a ring (the additive group of the ring). In this exercise, we reverse the procedure 
and construct a ring from an abelian group. 

Let A be an abelian group. An endomorphism of A is a homomorphism from A 
to A. Let End(A) be the set of all endomorphisms of A. We define two operations on 
End(A), pointwise addition, and composition, as follows: 


a(0 + 4) = a8 +a6, 
a(0) = (a8)¢. 


Prove that End(A), equipped with these operations, is a ring. (It is called, naturally, 
the endomorphism ring of A.) 


Some special groups 


3.11  Cayley’s Theorem. Before the rise of the axiomatic method in the 
late nineteenth century, group theory was already a flourishing subject; but, of 
course, the meaning of the term ‘group’ was different. A group always consisted of 
elements of some special type, with a specified composition law. Most commonly, 
a group was either a permutation group (whose elements are permutations 
of a set, and whose operation is composition of permutations), or a matrix 
group (whose elements are matrices, and whose operation is matrix multipli- 
cation). In modern terminology, we could say that the early group theorists 
studied subgroups of the symmetric group Sym(Q)) or of the general linear group 
GL(n, F). 

In order that this body of knowledge should not be lost, it is necessary to 
ensure that the new groups (axiomatically defined) are really the same as the 
old ones. We already showed in Section 3.2 that the symmetric group and the 
general linear group are groups in the axiomatic sense, and hence their sub- 
groups are too. The point of Cayley’s Theorem is to show the converse of this for 
permutation groups: that is, every group ‘is’ a permutation group. Of course this 
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is not literally true, since the elements may not be permutations. In the spirit of 
abstract algebra, the correct statement goes like this: 


Theorem 3.24 (Cayley’s Theorem) Every group is isomorphic to a permu- 
tation group (a subgroup of the symmetric group). 


Cayley proved this theorem by means of the Cayley table of the group. Before 
embarking on the general proof, we will work a particular case: the Klein group 
V,. To begin the proof, we rewrite its Cayley table, calling the group elements 


{915 92,93, 94}- 


o |g go gs ga 
9) 91 92 93 ga 
92 | 92 G1 Ga 93 
93 | 93 94 G1 92 
94| 94 93 92 91 


Now we use the columns of this table to define some permutations. Each 
column is labelled with one of the elements gi,..., ga. It contains all the elements 
gi,-++,g4 in some order. We let 7; be the permutation which maps the number 
i to the index of the element in the ith row and jth column. 

For example, in the second column, the elements are (g2, 91, 94,93), and so 
m2 is the permutation which maps 1 to 2, 2 to 1, 3 to 4, and 4 to 3; that is, 
in cycle notation, mz = (1,2)(3,4). In the same way, we find the other three 
permutations: 7, is the identity; 73 = (1,3)(2,4); and a4 = (1, 4)(2,3). 

Now the proof of Cayley’s Theorem for this group consists in showing that 
{71, 72, 73,74} is a group, and moreover, is isomorphic to the original Klein 
group (where the isomorphism maps g; to 7; for i = 1,2,3,4). 


Proof of Cayley’s Theorem Let G = {g1,...,9n}. (This proof presupposes 
that the group G is finite; but in fact it works in the same way for infinite n, 
except that the set of indices of the gs will be infinite.) We take 0 = {1,...,n}, 
and find a subgroup of Sym(Q) isomorphic to G. 

First, we define permutations 7,...,7, of Q. For 1 <i <n, let a; be the 
function which maps j to k if g;9; = gx holds in G. (This corresponds as above 
to the ith column of the Cayley table of G.) This function is one-to-one: for, if 
jm, = ln =k, then 939: = 919i = ge, whence g; = m by the Right Cancellation 
Law, so 7 = 1. Moreover, it is onto, since for any k, if gxg; ' is the element 95: 
then 9;9: = gr and so ja; = k. Hence 7; is a bijection, and thus a permutation. 

Now we define a map 6 from G to Sym(Q) by the rule that g;0 = 7;. We 
claim that @ is a homomorphism, and that Ker(@) = {1}. 

Take gs,g: € G. Suppose that g,9: = gu. We have to show that 1,7, = Tu, 
by applying both sides to an arbitrary element 7 and checking that the results 
are equal. 
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Let it, = 7, jut = k, ity, = 1; we must show that k = 1. By definition, we 
have 


Gigs = Gj, G5 9t = Gk, Gigu = J- 


Hence 


H = 9:(Gs9t) = (919s) Gt = 959t = Ik, 


and so l= k as required. 
Now suppose that g; € Ker(@). Then 7; is the identity, so jz; = j for all j. 
But this means that g;9; = g;, whence (by the Left Cancellation Law) g; = 1. 
So Ker(@) = 1. 
Thus, by the First Isomorphism Theorem, G is isomorphic to Im(6), a 
subgroup of Sym(Q). 


This is not the only way of finding a subgroup of a symmetric group which is 
isomorphic to a given group. For example, the proof of Cayley’s Theorem shows 
that S3 (a group of order 6) is isomorphic to a subgroup of $6; but this group is 
given to us as a subgroup of 53. Again, an entirely different way to realise the 
Klein group inside S4 is given in Exercise 3.29. 

As for matrix groups: Exercise 3.30 shows that every finite permutation group 
is isomorphic to a matrix group. Hence, by Cayley’s Theorem, every finite group 
is isomorphic to a matrix group. However, not every infinite group is isomorphic 
to a matrix group. 


3.12 Small groups. How many different groups are there? In this section, 
we will examine groups of small order (up to 8), and verify most of the 
entries in Figure 3.1, which gives the number of groups of given order (up to 
isomorphism). 

Of course, there is only one group of order 1, since there is no choice 
about the operation! The next observation settles four of the remaining seven 
values. 


Proposition 3.25 A group of prime order is cyclic. 


Proof Let G be a group of prime order p. Take any element g € G which is 
not the identity. The order of g divides p, and is not 1, hence is p. Now we know 
that a group G containing an element g whose order is equal to |G| is cyclic. 
(The p distinct powers of g must comprise the whole of G.) 


Order }1 234567 8 
Number of groups|1 1 1 2 1 2 1 °5 


Fig. 3.1 Small groups 
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Of course, we also know that cyclic groups of the same order are isomorphic. 
This verifies the table entries for orders 2, 3, 5, and 7. 
The remaining entries require more work. 


Groups of order 4 Let G be a group of order 4. If G contains an element of 
order 4, then it is cyclic. So suppose not. Then the order of any element of G 
apart from the identity is equal to 2; in other words, g? = 1 (or, equivalently, 
g 1 =g) for allg EG. 

From this, we deduce that G is abelian. For example, take any two elements 
g,h eG. Then 


gk=g"h-* = (hg)? = Ng: 


Let G = {1,a,b,c}. We construct the operation in G. We know the product 
of 1 with anything, and we know that g? = 1 for any element g. What is ab? It 
cannot be 1, since this implies b = a~! = a; it cannot be a, since this implies b = 1 
by cancellation; and similarly it cannot be b. So necessarily ab = c. In exactly 
the same way, the product of any two of a, b,c is the third; so the multiplication 
is determined. 

We have shown that there are at most two non-isomorphic groups of order 4, 
the cyclic group and one other. Since we already know that two different groups 
exist (namely, the cyclic group Cy and the Klein group V4), the verification is 
complete. 


Groups of order 6 We begin with a useful result. 


Proposition 3.26 Let G be a finite group in which every element g satisfies 
g° =1. Then the order of G is a power of 2. 


Proof This follows from Cauchy’s Theorem 3.9, proved in Exercise 3.16: for if 
|G| were not a power of 2, it would be divisible by some odd prime p, and G 
would contain an element of order p. Here is a direct proof. 

We showed above that G must be abelian. Now verify the following: 


If H is a subgroup of G and g ¢ H, then H U Hg is a subgroup of G. 


We prove this using the First Subgroup Test. Every element of G is equal to 
its inverse, so H U Hg is closed under taking inverses. We have to show that it 
is closed under multiplication. Take two elements of H U Hg; each is of the form 
h or hg, for some h € H. Since G is abelian, we can bring the gs to the end of 
the product; we find an element of one of the forms hih2, hihog, or hihog?. By 
closure of H and the fact that g? = 1, this element is in H U Hg. 

Now start with the identity and form an increasing chain of subgroups by 
applying this result as long as the current subgroup is not the whole of G. 
Eventually the process must terminate when we reach G. But each subgroup in 
the chain is twice as large as its predecessor, and so has order a power of 2. 
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We return to the matter in hand. Let G be a group of order 6. If G contains 
an element of order 6, then it is cyclic; so suppose not. Then any element of 
G has order 1, 2, or 3. Since 6 is not a power of 2, there must be an element 
a of order 3. Also, since 6 is even, there must be an element b of order 2 (see 
Exercise 3.15). 

We claim that the elements 1, a,a?,b, ab,a?b of G are all distinct. Certainly 
the first three are all distinct, since a has order 3, and the last three are all 
distinct, by the Right Cancellation Law. Moreover, b is different from 1,a,a?, 
since it has order 2. Consider ab. If ab = 1, then b = a7!; if ab = a, then b = 1; 
if ab = a”, then b = a. All are impossible. Similarly for a7. 

So G = {1,a,a7,b, ab, ab}. It remains to determine the multiplication. 

We will know how to multiply any two elements once we know which of 
the six elements is ba. For the only difficulty will occur when we multiply an 
element ending with b by an element beginning with a; any other product can 
be identified by the rules we have already. (For example, (a?)(a2b) = a*b = ab.) 
Now ba is not equal to 1 (or b = a~+); it is not a (or b = 1); it is not a? (or 
b= a); and it is not b (or a= 1). So ba = ab or ba = a7b. 

If ba = ab, then (ab)” = a"b" for all n, by the Exponent Law. Then (ab)? = 
a? #1, (ab)? = b? = b #1. So the order of ab is not 1, 2, or 3; it must be 6, 
contrary to our case assumption. So it must be the case that ba = a?b. Then the 
multiplication is determined, so there is at most one group (up to isomorphism). 

Since we already know two groups of order 6 (the cyclic group Cg and the 
symmetric group $3), the entry in the table is verified. 


Groups of order 8 This is more difficult (in part because there are five 
different groups), and we will simply outline the steps. Let G have order 8. 
If G contains an element of order 8, then G is cyclic; if every element g satisfies 
g’? = 1, there is just one type of group. So we may assume that every element 
has order 2 or 4 (except for the identity), and that there is an element a of 
order 4. 

Take any element b which is not a power of a. Then, as above, we find 
that G = {1,a,a’,a°, b, ab, a*b, a°b}. Now we need two pieces of information to 
determine the multiplication: we have to know which of these eight elements is 
b?, and which is ba. We find that b? = 1 or 6? = a?, and that ba = ab or ba = ab. 
This seems to give us four different groups, which added to the two already found 
would make six. But two of them are the same. In the group given by b? = 1 
and ba = ab, set b/ = ab; then (b')? = a? and b’a = ab’. So the same group arises 
in two different guises. 

It remains to show that all five possibilities really are groups (Exercise 3.34). 


3.13 Symmetric and alternating groups. We have already met the sym- 
metric group S,, the group of all permutations of {1,...,n}. In this section, we 
will decide when two elements of 5, are conjugate, and find the normal subgroups 
of S, for n < 5. 

The conjugacy test will depend on the cycle notation for permutations, which 
we met in Section 3.2. To review, let 7 be a permutation of {1,...,n}. To write 
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cycle notation for 7, we do the following: Open a bracket and write any element 
of {1,...,n} (we might as well take 1, but it does not matter.) Then write the 
image of this point under successive applications of 7, until this image returns 
to its starting point (which it does in a finite number of steps). Then close the 
bracket. While any elements of {1,...,n} remain, open a bracket, and follow the 
same procedure. 

The cycle structure of 7 is the list of the lengths of its cycles. Each number 
occurs as many times in the list as there are cycles of that length. By con- 
vention, we write the cycle lengths in increasing order. Thus, the permutation 
(2, 5)(3, 7,8)(6,9) of {1,...,9} has cycle structure 1,1,2,2,3. (Remember that 
we do not write cycles of length 1). 

Here is a first indication of the kind of information that we can read off from 
the cycle structure. 


Proposition 3.27 The order of a permutation is equal to the least common 
multiple of the lengths of its cycles. 


Proof Let the cycle lengths of 7 be aj,...,a,. If we compose 7 a; times, then 
all points in the cycle of length 7 return to their starting positions. The same is 
true if we compose it any number of times which is a multiple of a;. So, if we 
evaluate 7”, where m is divisible by all of a1,...,a,;, then every point returns to 
its starting position; that is, 7” = (1). On the other hand, if m is not divisible 
by some a;, say m = ajq+r, where 0 < r < a;, then points in a cycle of length a; 
are shifted r places along by 7™, so 7” 4 (1). We conclude that 7” = 1 if and 
only if mis acommon multiple of a1,...,a,. So the order of 7, which is the least 
positive m such that 7” = 1, is the least common multiple of aj,...,a,. 


Recall that two elements x,y of a group G are conjugate if y = g~! 


some g € G. 


xg for 


Theorem 3.28 Two elements of S;, are conjugate if and only if they have the 
same cycle structure. 


Proof Suppose first that y = g !ag. Consider any cycle of 2, say 
(p1,P2,---,Pk). This means that x maps pi to po to... to py and back to 
pi. Let q = pig for i = 1,...,k. We check that y maps q; to q to... to gr 
and back to q,. For i < k, the effect of the composition g~!xg on q; maps it to 
py; (for pig = G, So Gg! = p;), then to pj41, then to q41. Similarly, q, goes 
to qm. So (q1,q42,---,Qk) is a cycle of y. Every cycle arises thus, and the cycle 
decomposition of y is obtained. So y has the same cycle structure as x. 

For the converse, suppose that x and y have the same cycle structure. Write y 
under z (both in cycle notation) so that cycles of y are under cycles of the same 
length of x. (Include cycles of length 1 in this step.) Then let g be the permutation 
that maps each point in the cycle notation for x to the point directly beneath it. 
The argument of the preceding paragraph shows that g~!wg = y, so that x and 
y are conjugate. 
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Example Let « = (1,4,6)(2,5)(3) and y = (2,5,4)(1,6)(3). Then a per- 
12 3 4 5 6 
. a=]: _ a . = 
mutation g such that g-*ag = y is given by g = 2135 6 i, in 
two-line notation, or (1,2)(4,5,6) in cycle notation. (It is pure coincidence 
that the cycle structure of g is the same as that of x and y. There is nothing 
unique about g; if we had written y as (5,4, 2)(6,1)(3), we would have obtained 


(; é ; i : 5) = ,.5)02,6)) 


Let us use this result to compute the conjugacy classes in S, for n = 4,5, 
and hence to find the normal subgroups of these groups. 


The group S4 We list the possible cycle structures, and the number of elem- 
ents of each possible structure, in the following table. (Ignore the column labelled 
‘Parity’ for now.) 


Cycle structure | # elements Parity 
1,1,1,1 1 E 
1,1,2 6 O 
2,2 3 E 
1,3 8 E 
4 6 O 
24 


How do we calculate the numbers? There is a general formula, but in this case 
it is easier to do it directly. There are (3) = 6 permutations of cycle structure 
1,1,2, since we must choose the two points to be transposed from {1, 2,3, 4} and 
then the other two are fixed. The number for cycle structure 2,2 is half of this, 
since each such element is made up of two transpositions, and each transposition 
occurs once in such a permutation. For type 1,3, there are four choices of the 
fixed point, and two ways of permuting the other three in a 3-cycle. Finally, 
consider 4-cycles. We can take each to start (1, in cycle notation); then there 
are 3! = 6 ways to fill in the other three numbers, each giving a different cycle. 
Of course, there is only one identity. As a check, these numbers add up to the 
group order. 

To find the normal subgroups of $4, we use the fact that a subgroup is normal 
if and only if it is a union of some of the conjugacy classes; of course the conjugacy 
class of the identity must be included. So first we solve the problem: how can 
we take some of the conjugacy class sizes which add up to a divisor of 24? (This 
last requirement comes from Lagrange’s Theorem.) 

If we include 1 but not 8, then (since all other class sizes are multiples of 
3) the sum is congruent to 1 mod 3. The only such divisors of 24 are 1 and 4. 
The first corresponds to the identity only (which is a normal subgroup), and the 
second to the identity and the three permutations of cycle structure 2,2 (which 
form the Klein group, also a normal subgroup). 


Groups 139 


If we include both 1 and 8, the only possible divisors are 12 = 14+ 843, 
and 24, the sum of all the divisors. The latter corresponds to taking the whole 
of $4, which is trivially a normal subgroup. The former case also gives a normal 
subgroup. This can be checked directly, but it is a special case of something much 
more general. 

We define the parity of a permutation 7 to be the parity of n — c(z), 
where c(z) is the number of cycles in the cycle structure of 7 (including cycles 
of length 1). It is either even or odd. Check that the parities of elements of 
S4 are correctly given in the above table. We also define the sign of a per- 
mutation 7 to be (—1)"~°™): so odd and even parity correspond to — and 
+ sign. We saw this definition in Chapter 1; now we will see a remarkable 
property. 

We are going to show the following: 


Theorem 3.29 The map @ that takes a permutation 7 to its parity is a 
homomorphism from S, to the group Zz of integers mod 2. Its kernel, the 
set of all permutations of even parity, is a normal subgroup of Sy, having 
index 2. 


Proof There are several steps to the proof. 


Step 1 First we show that, if + is a transposition (a permutation 
interchanging two points and fixing the rest), then 7 and m7 have opposite 
parity. To see this, we have to count cycles of 77. These are the same as the 
cycles of 7 except for the ones containing the two points interchanged by 7, say 
i and j. Now check that, if i and j are in the same cycle of 7, this splits into 
two cycles of 77, while if they are in different cycles of 7, these cycles get ‘glued 
together’ in mr. 

So c(a7) = c(7) £1, and the difference of 1 in either direction changes the 
parity. 


Step 2 Any permutation is a product of transpositions. There are 
many different expressions, using different numbers of transpositions; but the 
parity of the number of transpositions needed to express m is equal to the 
parity of 7. 

To see that any permutation is a product of transpositions, we only need to 
check this for one cycle, since we can deal with the cycles separately. Now verify 
that 


(1,253, ayn) = (1,2)(3) 0 Cn): 


The identity has n cycles of length 1, so its parity is even; and thus, by Step 1, 
the parity of a product of k transpositions is equal to the parity of k. 


Step 3 Parity is a homomorphism. We need to show that the parity of 7,7 is 
the sum (mod 2) of the parities of 7; and 72. But this is clear from Step 2, on 
expressing 7, and 72 as products of transpositions. 
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The normal subgroup of S, consisting of permutations with even parity is 
called the alternating group, written A,. Its order is n!/2. (The name comes 
from a different description of A,. A function f (21, 22,...,@n) is called alter- 
nating if it changes sign when two of its variables are interchanged. The simplest 
example of such a function is [];— j (x; — 2;), Now the alternating group consists 
of all permutations which, when applied to the variables 71,...,2%n, leave the 
value of an alternating function unchanged. The symmetric group is related to 
‘symmetric functions’ in the same way.) 


Remark It follows from the theorem that sign is a homomorphism from S,, to 
the multiplicative group {+1, —-1}. 


Now we proceed to find the normal subgroups of S5. First, we list the 
conjugacy classes. (For convenience we have given them names.) 


Name | Cycle structure | # elements Parity 
Cl 1,1,1,1,1 1 E 
Cy 1,1,1,2 10 O 
C3 12:2 15 E 
C4 1,1,3 20 E 
Cs 2,3 20 O 
Co 1,4 30 O 
Cr 5 24 E 

120 


A normal subgroup N of Ss is a union of conjugacy classes, including the 
class Cy. If N does not include C7, then |N| =5 1, and |N| divides 24, whence 
|N| = 1 and N = {1}. If, however, N does include C7, then |N| > 25, whence 
|N| = 40, 60, or 120, and N also includes C3. Now 


(1, 2)(3, 4) - (1, 2)(8,5) = (8, 4,5), 


so if N contains C3, it also contains at least one element of C4, whence N contains 
C4. This leaves only two possibilities: either N = Cy U C3 U C4 U C7 = As, or 
N = Ss. We conclude: 


Proposition 3.30 The only normal subgroups of Ss are {1}, As, and Ss. 


3.14 Symmetry groups. Some further examples of groups arise geometri- 
cally, as groups of symmetries of polygons and polyhedra. 

Let P be a regular polygon with n vertices. Assume that its centre is at the 
origin. A symmetry of the polygon is a transformation which maps vertices 
of the polygon bijectively to vertices, and edges to edges. Alternatively, we can 
think of a symmetry as a transformation of the Euclidean space which maps the 
vertices and edges of the polygon to themselves. 

There are two types of symmetries: rotations about the origin through 
multiples of 27/n radians; and reflections in ‘axes of symmetry’ of the polygon. 
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There are clearly n rotations, each of which is obtained by composing the rota- 
tion through 27/n an appropriate number of times. Hence, the rotations form 
a cyclic group of order n. Also, there are n reflections. If n is odd, each axis of 
symmetry joins a vertex to the midpoint of the opposite side. If n is even, how- 
ever, there are two types of axes of symmetry: one type joins a pair of opposite 
vertices (there are n/2 of these), the other joins the midpoints of a pair of oppo- 
site sides (and there are also n/2 of these). The full group of symmetries is called 
the dihedral group of order 2n, written D2, (but note that some people call 
it Dn). 
Figure 3.2 shows the axes of symmetry in the two cases. 


Theorem 3.31 The symmetry group of a regular n-gon is the dihedral group 
of order 2n. It has a cyclic normal subgroup of order n consisting of rotations; 
every element outside this subgroup has order 2. 


How do we represent symmetries? One method is to regard the symmetry 
group as consisting of permutations of the vertices; we number the vertices from 
1 to n and write down the permutations. Another is to think of them (as described 
above) as Euclidean transformations. Since we chose the polygon to have its 
centre at the origin, these transformations can be represented by matrices. 

For example, consider the square with vertices at (+1,+1), shown in 
Figure 3.3. Number its vertices anti-clockwise starting from the top left, as in 
the figure. Then the elements of the group of symmetries are as follows, as either 


Fig. 3.2 Axes of symmetry 


3 4 


Fig. 3.3 Symmetries of a square 
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permutations or matrices (rotations first, then reflections): 


(1), (1,2,3, 4), (1,3)(2, 4), (1,4, 3, 2), 

(1,4) (2,3), (1, 2)(3, 4), (2,4), (1,3) 

1 0 0 1 -1 O 0 -1 
1/’\-1 O/7’\ 0 -1/’\1 Of}? 

1 0 -1 0 0 1 0 -1 
-1/’?\0 1/’\1 O/7’\-1 0 

Note that a rotation has determinant +1, while a reflection has determi- 
nant —1. The map A +> det(A) is a homomorphism from the symmetry group 
of the regular n-gon onto the multiplicative group {+1,—1} (cyclic of order 2); 
the kernel is the rotation group (cyclic of order n). 

We have only defined the dihedral group D2,;, above for n > 3. However, with 
the geometrical approach, we can extend the definition to n = 1 and n = 2 as 
well, taking rotations through multiples of 27/n and corresponding reflections. 
For n = 1, we have the identity rotation and one reflection, giving Dz ~ C2. For 


n = 2, we have a group of order 4 isomorphic to the Klein group, containing two 
rotations and two reflections: in matrix terms, 


n= {6G 26 2G 9} 


In three dimensions, the figures analogous to regular polygons are regular 
polyhedra, which have regular n-gons for faces and regular m-gons for ‘vertex 
figures’ (obtained by slicing off a corner). There are only five of these, the so- 
called Platonic solids. To see that there cannot be more than five, recall that 
the internal angle in a regular n-gon is 7(1 — 2/n). The angles of the m faces 
surrounding a vertex must add up to strictly less than 27. (If the sum was 27, 
as for four squares or three hexagons, the figure would lie flat and not fold up; 
more than 27 would be even further from creating a polyhedron.) So we have 


Oo 


oOo 


mr(1 —2/n) < 2r, 


whence 1/m+1/n > 1/2. This inequality has the solutions (m,n) = (3,3), (3,4), 
(4,3), (3,5), and (5,3). 

This shows that not more than five regular polyhedra can exist. But we 
can construct models of each of the five. In the order described above, they are 
the tetrahedron, hexahedron (cube), octahedron, dodecahedron, and icosahedron 
respectively. The Greek prefixes in the names of these figures stand for the total 
numbers of faces they have, namely 4, 6, 8, 12, and 20 respectively. Figures 3.4 
and 3.5 show the regular polyhedra. 
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‘QD Y 


Fig. 3.4 Tetrahedron, cube, and octahedron 


OG 


Fig. 3.5 Dodecahedron and Icosahedron 


Theorem 3.32 The properties of the five Platonic polyhedra are given in the 
following table: 


Name Faces Edges Vertices Rotation Symmetry 
Group Group 
Tetrahedron 4 6 4 Ag S4 
Cube 6 12 8 S4 S4 x Co 
Octahedron 8 12 6 S4 S4 xX Co 
Dodecahedron 12 30 20 As As x Co 
Icosahedron 20 30 12 As As x Co 


Proof You will be greatly helped in following this proof if you have models of 
the polyhedra available as you read. The models make the first three columns 
of numbers clear. The other thing to notice is that there is a ‘duality’ relation 
between the cube and the octahedron. If we take the six points at the centres 
of the faces of a cube as vertices, we obtain an octahedron, and vice versa. This 
explains why the number of faces of the cube is equal to the number of vertices 
of the octahedron, and vice versa. It also implies that these two figures have the 
same rotation group and the same symmetry group (thinking of these groups 
as Euclidean transformations fixing the figures in question). A similar duality 
relation holds between the dodecahedron and the icosahedron. The tetrahedron 
is ‘self-dual’: if we put vertices at its face centres, we obtain another tetrahedron. 
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Now, in each case, the order of the rotation group is the product of the 
number of faces and the number of edges of a face. For imagine that we have a 
frame on the table into which a face of the solid fits. We can specify a rotation 
by saying which face should go into the frame, and in which orientation. Thus 
we find that the rotation groups have orders 12, 24, 24, 60, 60 respectively. Also, 
in all cases the map A+> det(A) is a homomorphism from the symmetry group 
onto the group {+1} whose kernel is the rotation group; so the symmetry group 
is twice as large as the rotation group. 

It remains to identify the groups. 

For the tetrahedron, there are four vertices, which are permuted by any sym- 
metry; since there are 24 = 4! symmetries, the symmetry group must be S4. The 
rotation group is a normal subgroup of index 2, which must be the alternating 
group Ay, (either by our determination of all normal subgroups of $4 in the last 
section, or by inspection). 

For the other figures, the argument is more dependent on the models. A 
cube has four diagonals (joining opposite vertices) which are permuted by its 
symmetries. No non-identity rotation can fix all the diagonals [why?], so the rota- 
tion group is S4. However, the reflection represented by the matrix —J (inversion 
in the centre) fixes all diagonals. In fact, this matrix commutes with all trans- 
formations in the group, so {+I} is the centre of the symmetry group, and is 
a normal subgroup. From this it can be deduced that the symmetry group is 
S4 x Co (Exercise 3.40). 

In the remaining case, the argument requires a more elaborate model, or very 
good geometrical intuition! It is possible to ‘inscribe’ a cube into a dodecahedron, 
so that the vertices of the cube are eight of the twenty vertices of the dodecahe- 
dron, in just five different ways. These five inscribed cubes are permuted among 
themselves by the rotations of the icosahedron. Thus, the rotation group is a 
subgroup of order 60 of $5. Such a subgroup has index 2, and hence is normal, 
so is necessarily As. The proof that the symmetry group is As x C2 is much as 
before. (Another approach is given in Exercise 3.41.) 


Exercise 3.29 Show that the set 


{(1), (1, 2), (3, 4), (1, 2)(3, 4) 


of permutations also forms a group isomorphic to the Klein group. 


Exercise 3.30 Let 7 be a permutation of {1,...,}. Define the permutation matrix 
P(m) as follows: P(7) is an n x n matrix whose (i, 7) entry is equal to 1 if im = j, and 
is 0 otherwise. So each row or column of P(7) contains exactly one entry 1. 

Prove that P(172) = P(m1)P(m2). Hence show that every finite permutation group 
is isomorphic to a matrix group (over any field). 


Exercise 3.31 Let G = {g1,..., gn} be a group. Show that the Cayley table of G is an 
n Xn array with entries g1,...,9n with the following property: each element g; occurs 
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exactly once in any row or column of the array. (Arrays with this property are called 
Latin squares. They are used in design of experiments in statistics.) 

Show that G is abelian if and only if its Cayley table is symmetric (equal to its 
transpose). 


Exercise 3.32 Let G and H be groups. Let G x H be the Cartesian product of the 
sets G and H (the set of all ordered pairs (g,h), with g € G and h € H). Define an 
operation on G x H by the rule 


(g1, h1)(g2,h2) = (gig2, hihe). 


Prove that G x H is a group. (This group is called the direct product of the groups 
g and H.) 

Prove that |G x H| =|G|-|H|. 

Prove that the direct product of abelian groups is abelian. 


Exercise 3.33 (a) Prove that the Klein group V4 is isomorphic to C2 x C2. 
(b) Prove that C2 x C3 = Ce. 
(c) Among groups of order 8, we find Cg, C4 x C2, and C2 x C2 x C2. Identify them 
in the analysis of groups of order 8 given above. 


Exercise 3.34 (a) Prove that the eight permutations 
(1), (1, 2, 3, 4), (1, 3)(2, 4), (1,4, 3, 2), 


(1, 2)(3, 4), (1, 3), (1, 4)(2, 3), (2, 4) 


form a non-abelian group. [Hint: Construct a Cayley table.] 
(b) Prove that the eight matrices 


(0 3)-(o 4) (0 2) §). 
(io) o)-G oo) 


over the complex numbers form a non-abelian group, not isomorphic to the group in 
(a). [Hint: Count elements of order 2.] 


Note These two groups are called the dihedral group and the quater- 
nion group of order 8, respectively. The quaternion group arises from the 
quaternions discovered by W. R. Hamilton (see Exercise 2.11). If i,j,k are 
the quaternion ‘units’, satisfying 


?=fP=k? =ijk=-1 


9 


then the quaternion group consists of the eight elements 


{+1, +i, +), tk}. 


Exercise 3.35 (a) Prove that, if p and q are distinct primes, then the direct product 
of Z, and Zy is isomorphic to Zpq. 
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(b) Prove that, if p is prime, then the direct product of Z, with itself is not 
isomorphic to Z,2. 


Exercise 3.36 Show that the alternating group Az, of order 12, has no subgroup of 
order 6. 

(This shows that the converse of Lagrange’s Theorem is false; it is not true in 
general that, if G is a group of order n, and m divides n, then G has a subgroup of 
order m.) 


Exercise 3.37 Let G be the dihedral group of order 8, the group of symmetries of a 
square. Let z be the symmetry which is a rotation through 180°. Verify that the centre 
Z(G) is the subgroup {1, z}. Now G/Z(G) is a group of order 4: is it the cyclic group 
or the Klein group? 


Exercise 3.38 (x) There are three partitions of the set {1, 2,3, 4} into two sets of size 
2, namely, 


e A= {{1, 2}, {3, 4h}; 
e B= {{1, 3}, {2,4} }; 
o C= {{1,4}, {2,3}}. 


Any permutation g of {1,2,3,4} induces a permutation g* of the set {A, B,C}. For 
example, if g is the cyclic permutation (1, 2,3,4), then g* = (A,C)(B). 

Show that the map 6: Sa — S3 defined by 0(g) = g* is a homomorphism. Describe 
the image and kernel of 0, and check that 


| Im(6)| - | Ker(@)| = [Sa]. 


Exercise 3.39 (x) Let G be a group having two normal subgroups N and M with 
the properties that NM = G and NM M = {1}. Show that an element of M and an 
element of N commute. Show that any element of G can be written uniquely in the 
form nm, for n € N and m € M. Hence show that G&= N x M. 

Verify that these conditions are satisfied when G is the symmetry group of a cube, 
N the rotation group, and M = {+l}. 


Exercise 3.40 Calculate the conjugacy classes in the rotation group of the cube (in 
terms of type of axis of rotation and angle of rotation), and match them up with the 
conjugacy classes in the symmetric group S4. 


Exercise 3.41 Take a model of either a dodecahedron or an icosahedron; pick it up 
by one edge, and hold it with this edge horizontal at the ‘north pole’. Check that, 
on the ‘equator’, there are two horizontal edges at antipodal points (in the direction of 
the ends of the top edge), and two vertical edges at the intermediate points, while at 
the ‘south pole’ there is an edge parallel to the one at the ‘north pole’. 

This means that any edge belongs to a unique set of six edges in three mutually 
perpendicular directions. The thirty edges thus define five such sets of six (or ‘frames’, 
as we shall call them). 

Show that any rotation induces a permutation on the set of five frames, and no 
rotation except the identity can fix all five frames. 

Deduce that the rotation group of the figure is isomorphic to As. 
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Appendix: How many groups? 


We have seen that the number of binary operations on a set of n elements is rn, 
These are systems satisfying the axiom (GO). 

Even if we count them up to isomorphism, we obtain a very rapidly growing 
function. For the number of structures on a set of n elements isomorphic to a 
given one is at most the number n! of bijections of the set; so up to isomorphism 
the number Fo(n) of binary systems satisfies 


Fo(n) > n” /n! > n™ /n™ =n, 


One measure of the strength of the group axioms is to estimate how many 
structures satisfy various collections of axioms. 

It is clear that the identity and inverse laws alone are not very powerful. 
If we take A = {aj,...,a@,}, where a; is the identity, then the first row and 
column of the operation table are determined, but the remaining entries are 
arbitrary; so there are n("-))” such structures, and at least n(-)” /(n — 1)! up 
to isomorphism. Similarly, we get a lower bound for the number satisfying the 
inverse law by counting just those for which each element is its own inverse; there 
are n("—D("—2) of these, so at least n&°—)("—2) /(m — 1)! up to isomorphism. 

A better approach is to consider the cancellation laws. Notice that we used the 
associative law as well as the identity and inverse laws to prove the cancellation 
laws in a group. (For example, if ab = ac, then a~'(ab) = a~'ac; using the 
associative law, (a~ta)b = (a~ta)c, whence b = c.) Notice, too, that in a finite 
structure, the identity and cancellation laws imply the existence of inverses. For 
the cancellation law implies that the map x + az is one-to-one; in a finite 
structure, this map is also onto, so there exists x such that az = 1. However, 
we need the associative law to prove that left and right inverses are equal: if 
ax = 1 = ya, then x = (ya)x = y(ax) = y. 

Accordingly, we define a quasigroup to be a set with a binary operation 
o in which the equations ax = b and ya = b have unique solutions x and y 
for any given a and b. Another way of saying the same thing is that the operation 
table has the property that each element occurs exactly once in each row or 
column. 

A Latin square is an n x n array containing n different entries, such that 
each entry occurs exactly once in each row and once in each column. Thus, a 
set with a binary operation is a quasigroup if and only if its operation table is a 
Latin square. 

In addition, we define a loop to be a quasigroup with identity. Taking the 
identity to be the first element, this requires that the entries in the first row and 
column of the operation table are equal to the row and column labels. 

So we see that the numbers of quasigroups and loops with n elements are 
each at least L(n)/n!, where L(n) is the number of Latin squares of order n. The 
value of L(n) is not known precisely, but it is known to be at least (en)”” for 
some positive constant c. As before, dividing by n! doesn’t have much effect. 

By contrast, the number of groups is much smaller: 
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Theorem 3.33 The number of non-isomorphic groups of order n does not 
exceed n”!82", 


Proof First, we present some terminology. In a group G, the subgroup gen- 
erated by the elements g1,...,g, is the smallest subgroup containing these 
elements; we say that gi,...,g- generate G if there is no proper subgroup of 
G which contains them. We write G = (g1,...,g,) in this case, generalising 
the notation for cyclic groups. Another characterisation of this is: every element 
of G can be written as a product of elements chosen from gj,...,g, and their 
inverses. 


Step 1 A group of order n can be generated by at most log, n elements. 

To show this, choose elements g1,g2,... so that g; is not the identity and 
gi+1 is not in the subgroup H; generated by gi,...,9; for each i > 1, as long 
as possible. Then H; = (g1) is not the identity, so |Hi| > 2. Also, since H+ 
properly contains H;, its order is a proper multiple of that of H; by Lagrange’s 
Theorem, and so |H;+1| > 2|H;|. By induction, |H;| > 2° for all i. When the pro- 
cess terminates, we have H; = G, so n > 2", or i < logy n; and, by construction, 
G can be generated by 7 elements. 


Step 2. By Cayley’s Theorem, there is a subgroup of the symmetric group Sp, 
which is isomorphic to G. Thus, there is an isomorphism @ from G to a subgroup 
of S,. If G = (g1,..-,g,), then any element of G is a product of some of g1,..., Jp 
and their inverses. So the image of this element under @ is determined by the 
images of g1,...,9r, which are r elements of S;,. 

Hence the number of groups of order n is not greater than the number of 
choices of r elements of S;,, where r = |log,n| (the integer part of log, n). This 
number is (n!)!°82”, 

Finally, n!, which is the product of the n numbers from 1 to n, is not greater 
than n”. So the number of groups of order n does not exceed n”!°82”, 


In the above proof, the essential ingredients are Lagrange’s Theorem and 
Cayley’s Theorem, two of the most basic results about groups. Using much 
more advanced group theory, this result has been improved. It is known that 
the number of groups of order n is at most noes ")” for some positive constant 
c. In other words, the exponent is reduced from nlogn to the much smaller 
c(log n)?. 

Even Cayley’s Theorem is not essential for this proof. If we are building 
the Cayley table of G, it is enough to construct the rows corresponding to the 
generating elements gi,...,9,r, since all other products can then be computed 
using the associative law. Now the number of r x n tables does not exceed 
nr” <n” logs n 

On the basis of this theorem, we might be tempted to conclude that the 
associative law is the most powerful of the group axioms. We define a semigroup 
to be a set with a binary operation satisfying the associative law, that is, a 
structure in which axioms (GO) and (G1) hold. 
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It is beyond the scope of this book to give an estimate for the number of 
semigroups with n elements. However, the numbers of small semigroups have 
been calculated. The table below shows that the number grows much faster than 
the number of groups. The numbers in the table are up to isomorphism. 

So it is more accurate to say that the combination of all the group axioms is 
more than the sum of its parts. 


n Operations Quasi Loops Semi Groups 
groups groups 
1 1 1 1 1 1 
2 10 1 1 5 1 
3 3330 5 1 24 1 
4 178981952 35 2 188 2 
5 2483527537094825 1411 6 1915 1 
6 14325590003318891522275680 1130531 109 =. 28634 2 


These numbers are taken from Neil Sloane’s On-Line Encyclopedia of Integer 
Sequences, on the web at http://www.research.att.com/~njas/sequences/ 

Recently, a team of group theorists (Hans Ulrich Besche, Bettina Eick and 
Eamonn O’Brien) marked the end of the second millennium by counting the 
groups with order at most 2000 up to isomorphism. There are 49910529 484 
groups, of which 49 487 365 422 have order 2!° = 1024. By contrast, of course, 
if p is prime, there is only one group of order p. So the counting function for 
groups is very erratic. 


4 Vector spaces 


By the time you reach this point, you will probably have met vector spaces in 
another course: perhaps matrices, geometry, mechanics, or linear algebra. The 
treatment here may be somewhat different. A vector space is an algebraic object 
like a ring or a group, and we will start with a collection of axioms, as we did 
in the chapters on rings and groups. Then we turn to subspaces and homomor- 
phisms, and find that homomorphisms between vector spaces can be represented 
by matrices, and indeed by matrices of a particularly simple form. In the last 
section, we see that for matrices with elements in a Euclidean domain (such as 
the integers), similar results apply. This result will seem a bit unmotivated; but 
we will put it to work in the next chapter! 


Vector spaces and subspaces 


4.1 Introduction. The notion of a vector space grew from the discovery by 
Descartes that points in the Euclidean plane can be represented by ordered pairs 
of real numbers (and points in 3-dimensional space by ordered triples). We are 
faced with two completely different descriptions: a point in the plane, or a pair 
of real numbers. Moreover, operations on vectors look quite different according 
to which description is used. For example, we add vectors by the ‘parallelogram 
law’ used in mechanics, and we add pairs of real numbers ‘componentwise’; but 
the result is the same. 

Furthermore, we want the possibility to generalise. We want to be able to talk 
about Euclidean space of n dimensions, for any n. (This is not just an intellectual 
game, but has important applications in fields as far apart as quantum mechanics, 
statistics, and signal processing.) Also, we want to be able to use fields other than 
the real numbers. Computers send information as sequences of binary digits (that 
is, n-tuples from the field Zz). So we define the concept of a vector space over an 
arbitrary field F’. We call the elements of F scalars, to distinguish them from 
the vectors. 

To set up the axioms, we regard two operations on vectors as basic. There 
is a binary operation of addition, written as + as usual. Also, for every field 
element c, there is a unary operation of scalar multiplication by c, written as 
juxtaposition: the product of the scalar c and the vector v is written as cv (with 
the scalar on the left). 

There is a potential problem here, since we have two different kinds of things, 
scalars and vectors, both of which can be added; there are two multiplications, 
one combining two scalars, the other a scalar and a vector; and we will see that 
there is a vector named 0 as well as a scalar with the same name. Sometimes, 
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this problem is dealt with by using bold type for vectors. This is common in 
3-dimensional mechanics, for example, where a vector is a ‘geometric’ object, 
unlike a scalar. I want to stress that a vector space, just like a field or a group, is 
an algebraic object; so I do not adopt this convention. I will usually use letters 
from near the end of the alphabet (typically u,v, w) for vectors, and letters from 
the other end (c,d) for scalars. There will be times when no convention can avoid 
confusion completely! 

We now give the formal definition. Let F' be a field. A vector space over 
F, or F-vector space, is a set V with a binary operation + (addition) and, 
for each c € F,, a unary operation of scalar multiplication by c, satisfying the 
following axioms: 


Addition axioms 

(VAO) (Closure law): For all v,w € V, we have vu-+wevV. 

(VA1) (Associative law): u+ (vu+w) =(u+v) + w for all u,v,w € V. 

(VA2) (Zero law): There exists 0 € V such that v +0 = 0+ 0 = v for all 
veEV. 

(VA3) (Inverse law): For all uv € V, there exists w € V with u+w = wt+v = 0. 

(VA4) (Commutative law): v+w=w-+v forallu,weV. 

Scalar multiplication 

(VM0) (Closure law): For all c € F and v € V, we have cv € V. 

(VM1) For all c € F and v,w € V, we have c(u+ w) = cv+cw. 

(VM2) For all c,d € F and v € V, we have (c+ d)v = cv + dv. 

(VM3) For all c,d € F and v € V, we have (cd)v = c(dv). 

(VM4) (Unital law): For all v € V, we have lv = v (where 1 is the identity 
of F). 


Remark 1. The addition axioms assert that a vector space, with the operation 
of addition, is an abelian group. This is a convenient way to remember them. 

2. It is possible to state the other axioms more briefly, too. If A is an abelian 
group, then the set End(A) of all homomorphisms from A to A is a ring, whose 
addition is defined pointwise, and whose multiplication is composition. (This is 
the endomorphism ring of A: see Exercise 3.28.) 


a(¢1 + $2) = ad, + aga, 
a(¢1g2) = (adr) 2. 


Now let V be a vector space over F’, and, for each c € F, let 4 denote the oper- 
ation of scalar multiplication by c, the function from V to V given by ve = cv. 
Then axiom (VM1) says that ju, is a homomorphism; that is, 1. € End(V). Next, 
let 0 be the function from F' to End(V) mapping the element c to the homomor- 
phism p-. Then axioms (VM2) and (VM3) say that @ is a ring homomorphism, 
and axiom (VM4) says that @ maps the identity element of F’ to the identity 
homomorphism. 
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So we can reformulate the definition as follows: 
A vector space over F’ consists of an abelian group V and a ring 
homomorphism 6 from F’ to End(V) which maps the identity to 
the identity. 
Perhaps you find this less helpful than the list of ten axioms. But it does 
point to a close connection between vector spaces, groups and rings, and it is in 
a form which makes it easier to generalise in a meaningful way. 


4.2 Examples. 


Example 1 Let V = F"”, the set of all n-tuples of elements of F’. Define 
addition and multiplication ‘coordinatewise’: that is, 


(1, @2,---,@n) + (b1, b2,..., On) = (a1 + b1, ag + be, ..., An + bn), 


C(a1,2,---,@n) = (a1, Cag,..., CO). 


This is a very important example. When we considered rings, we saw that, while 
there are quite varied examples, the ring of integers was sufficiently typical to 
act as a prototype. For groups, the examples were so varied that there was no 
useful prototype. The situation is different here. Not only is F” the prototype 
of a vector space; we will see that every ‘finite-dimensional’ vector space looks 
exactly like F”. 

For this reason, it is worth stopping to check that the ten vector space axioms 
do hold for F”. All the arguments are similar, and straightforward. For example, 
here is the proof of (VM3): 


(cd)(a1,...,@n) = (cday,...,cday) = c(da,,...,dan) = c(d(a,,...,an)). 
(We used the associative law for multiplication in F' here.) 


Example 2 Let 2 be any set, and let V be the set of all functions from 2 to 
the field F’. Define addition and scalar multiplication of functions ‘pointwise’: 


(f+ 9)(@) = f(a) + g(x), 
(cf) (a) = cf (x) 


for all  € 2. Then V is a vector space over F’. 

In the case where 22 is finite, say Q = {x1,...,%n}, we can represent the func- 
tion f uniquely by giving the list of its values, (f(z1),..., f(a@n)). Any n-tuple 
of elements of F' forms the list of values of a unique function. So we can identify 
V with the set F” of all such lists. The addition and multiplication are the same 
as in Example 1. 

Things are more interesting in the case where 2 is infinite. Suppose, for 
example, that F' is the field R of real numbers, and that Q is either R or an 
interval in R. By restricting to ‘interesting’ classes of functions, such as contin- 
uous functions or differentiable functions, we obtain further vector spaces. This 
is the subject-matter of Functional Analysis. 
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Example 3 Let K be a field, containing a subfield F’. Now suppose that we 
are slightly forgetful, and while we can remember how to add elements of K, we 
can only remember how to multiply them by elements of F’. Then with this loss 
of information, K becomes an F-vector space. 

In particular, any field F is a vector space over itself; but this is no surprise, 
since we can represent F as F', the set of all 1-tuples. 

A more interesting case is that where K = F{az]/(f), where f is an irre- 
ducible polynomial over F' of degree n. As we saw in Section 2.16, K can be 
represented as 


K ={ep terat-::+en-10" +: 69, ¢1,---;Cn—-1 € FY}, 


and in this representation we can add elements of K or multiply them by elements 
of F' (coordinatewise) without knowing the precise equation f(a) = 0 satisfied 
by a. So this is just Example 1 again. 


Example 4 The first vector space studied by mathematicians was the 
Euclidean plane. How do we see it as a vector space? 

A vector, in elementary geometry or mechanics, has magnitude and direction. 
If we choose a point in the plane to be the origin, then a vector is thought of 
as an arrow with its tail at the origin and its head at an arbitrary point of the 
plane. (The zero vector is a special case. Its head and tail are both at the origin; 
its length is zero and its direction is not defined.) 

Two vectors v,w are added by the parallelogram law: construct a paral- 
lelogram with one vertex at the origin and two sides corresponding to v and w 
(so that the heads of these vectors are two more vertices); then the fourth vertex 
is the head of u+ w. (This has to be modified if v and w point in the same or 
opposite directions, since then the parallelogram degenerates into a line.) If c 
is positive, then to multiply a vector v by c we take a vector with c times the 
length but in the same direction. If c is negative, we multiply the length by —c 
and reverse the direction. Finally, we take Ov to be the zero vector. 

Following Descartes, we represent each point of the plane by a pair (x, y) 
of real numbers (its coordinates). In a similar way, we can represent a vector 
by coordinates, taking the coordinates of its head. Thus, this labelling identifies 
the set of vectors with R?. It can be checked that the rules for addition and 
multiplication, when expressed in coordinates, are precisely those of Example 1. 


Example 5 In communication between computers, data is sent in the form 
of binary words, consisting of n-tuples of zeros and ones (for some fixed word 
length n). We regard the entries in such a word as being integers mod 2, that 
is, belonging to the field Fy = Z/2Z. Assume for example that n = 8. Now 
suppose that, during transmission, the signal is distorted by interference, so that, 
at the receiving end, the second, third, and fifth bits of the transmitted word 
are received incorrectly. Since changing an element of F2 can be done simply by 
adding 1 to it, we see that the effect of the noise is to add to the transmitted word 
the vector (01101000). In this sense, the received word is ‘signal plus noise’, the 
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addition being performed in the vector space F§. We will consider error correction 
further in Chapter 8. 


4.3 Properties of vector spaces. Since a vector space is an abelian group 
(so far as addition is concerned), we can immediately deduce that all the 
properties of abelian groups hold. For example, 


(a) the sum of n vectors is well defined, independent of bracketing or order; 
(b) the zero is unique; 
(c) the additive inverse of any element is unique. 


Here are a couple more simple properties. 


(d) Ov = 0 for any v; 
(e) (—1)v = —v for any v. 


Proof (d) 0v = (0+0)v = 0v+0v, by (VM2). Hence 0v = 0 by the Cancellation 
Law (which is valid in any abelian group). 

(e) (-l)v +0 = (-1l)ut lu = (-14+ L)v = 0v = 0; so (—1)u is the inv- 
erse of v. 


4.4 Subspaces. What happens next should be fairly familiar now. A subset 
W of the vector space V (over a field F’) is a subspace if it forms a vector space 
in its own right. As we saw for groups and rings, in order that W is a subspace, it 
is enough to check the various closure properties, since universal laws such as the 
associative and unital laws will be inherited by W from V. In this case, we have 
to check closure under addition and scalar multiplication, and that W contains 0 
and contains the inverses of its elements. In fact, the first two conditions suffice: 


Theorem 4.1 (First subspace test) A non-empty subset W of a vector 
space V is a subspace of V if and only if it is closed under addition and scalar 
multiplication; that is, w,,w2 © W implies w; + w2 € W, andce F,w Ee W 
implies cw € W. 


Proof Closure under scalar multiplcation does the trick for us. For any w € W, 
we have 0 = Ow € W and —w = (—1)w € W, by the properties proved in the 
last section. 


As before, we can combine the two kinds of closure into a single test: 


Theorem 4.2 (Second subspace test) The non-empty subset W of the 
vector space V over F is a subspace if and only if, for any c1,co € F and 
W1,W2 € W, we have cyw, + coWo € W. 


Proof If W is a subspace, and c,,c2,w1, Ww are given, then the closure laws 
show that cyw ; + cow2 € W. 

Conversely, suppose that this condition holds. Then, choosing cz = 0, we see 
that cyw, € W for all c; € F and w; © W; that is, W is closed under scalar 
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multiplication. Similarly, choosing c; = cg = 1 shows that it is closed under 
addition. 


4.5 Linear independence and bases. Let V be a vector space over F, 
and let v1,v2,...,Un be vectors of V. We say that v1, v2,...,Un are linearly 
dependent if there are scalars c,,C2,...,Cp, not all zero, such that 


C1U1 + Cg0g + +++ + CnUn = 0. 


Note the importance of the phrase ‘not all zero’. If we allowed the coefficients 
C1,--+;€n to be all zero, then the equation would be true for any n vectors. Note 
that if one of the vectors, say v;, is zero, then they are linearly dependent: take 
c; = 1 and c; = 0 for j £7. Ina similar manner, if two of the vectors are equal, 
say uv; = v;, then they are linearly dependent: take c; = 1, c; = —1, and c, = 0 
for k A i,j. 

Linear dependence can be formulated in another way. An expression a,w , + 
--+ + @mWm is called a linear combination of the vectors w1,...,Wm.- 


Proposition 4.3 The vectors v1,...,Un are linearly dependent if and only if 
one of them can be expressed as a linear combination of the others. 


Proof Suppose first that the vectors are linearly dependent. That is, we have 
C1V, +--+: + €pUn = 0, where the coefficients are not all zero. Say that c; 4 0. 
Then we have 


vj = —(c1c; "ur — +++ — (Gi-167 *)ui-a — (cep ef! )uiga — +++ — (enez*)un- 


In other words, v; is a linear combination of the others. 

Conversely, let v; is expressed as a linear combination of the other vectors. 
Then, subtracting v; from both sides of this equation, we find a linear combina- 
tion of all the vectors equal to zero, where the coefficient of c; is —1 4 0; so the 
vectors are linearly dependent. 


If the vectors v1, v2,..., Uy, are not linearly dependent then, naturally enough, 
they are called linearly independent. This is a negative definition, so we 
reformulate the concept more positively. The vectors v1, v2,...,Un are linearly 
independent if, whenever an equation 


CyVUy + CQ¥Qg + +++ + CyUn = 0 


holds, then we have cy = cg =... = Cyn = 0. 


Example Show that the vectors (1,1,1), (1,2,0), and (0,1,—3) in R® are 
linearly independent. 


Solution Suppose that the equation 


a(1, 1,1) + b(1,2,0) + c(0,1,—-3) =0 
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holds. In other words, 
(a+ b,a+2b+c,a— 3c) = (0,0,0). 


This gives us three equations 


a+b=0, 
a+2b+c=0, 
a—3c=0, 


for the three unknowns a,b,c. Solving these equations, we find that a= b= 
c= 0. So the vectors are linearly independent. 


A related concept is the span of a list of vectors. Let v1, v2,..., Un be elements 
of a vector space V over F’. The span of these vectors, written (v1,...,Un)F, is 
the set of all linear combinations of 11,..., Un. (If the field F is clear, we omit it 
from the notation.) 


Proposition 4.4 (v1,...,Un)r is a subspace of V. 


Proof We apply the subspace test. Take two vectors in (v1,...,Un) Fr, say W = 
C1Uy +++ + Cp¥yn and w! = chur +-+-+cv,. Then. for any a,a’ € F, 


aw +a'w! = (acy +c )ur +++ + (acy +.0'C, Jun 


is a linear combination of v1,...,Un. 


A list v1, v2,...,Un is called a spanning set for V if (u1,...,uUn) = V. If it 
is both linearly independent and a spanning set, it is called a basis. 


Theorem 4.5 The following conditions for a finite subset X of a vector space 
are equivalent: 


(a) X is a maximal linearly independent set; 
(b) X is a minimal spanning set; 
(c) X is a linearly independent spanning set. 


Proof We show that each of (a) and (b) is equivalent to (c). 

(c) implies (a): If X is a basis for V, then every vector in V is a linear 
combination of X; so adding any vector to X gives a linearly dependent set, 
which means that X is maximal. 

(a) implies (c): If X is maximal independent, then any vector v added to X 
gives a linearly dependent set. In a dependence relation, the coefficient of v must 
be non-zero (else X would be linearly dependent); so we can use the relation to 
express v as a linear combination of X. So X spans V, and is a basis. 

(c) implies (b): If X is a basis, then no element of X is a linear combination 
of the others, so no proper subset is spanning. 
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(b) implies (c): If X is a minimal spanning set, then no element of X is 
a linear combination of the others (or it could be dropped without losing the 
spanning property); so X is linearly independent. 


Now if a vector space has a basis, we know what it looks like: 


Theorem 4.6 (a) If X is a basis for V, then every element of V has a unique 
expression as a linear combination of X. 
(b) If V has a basis containing n elements, then V is isomorphic to F”. 


Proof Let v1,...,vn, be a basis. Any element v € V has an expression as a 
linear combination, say v = cv; +:+++CnUpn. Suppose that there is another such 
expression, say v = Civ; +--+ +C,Un. Then 


(ec, — c)u1 +--+ + (Cn — G,)0n = 0. 


Since v1,...,U, are linearly independent, the coefficients are all zero; so c; = Cc; 
for all 7. 

Now we map V to F” by taking v = civ; + +--+ Cnvn to the n-tuple 
(c1,--.-,€n). The first part of the theorem shows that it is a bijection; and clearly 
it preserves addition and scalar multiplication. 


Now we know exactly what a vector space with a basis looks like, except for 
the possibility that there are bases with different numbers of elements. We now 
show that this cannot happen. 

To do this, we first need a technical result about linear equations. 

Let @1,...,@n be variables taking values in a field F’. A homogeneous linear 
equation in 2),...,2, is an equation of the form c,x; +---+¢n2%, = 0, where 
C1,-++)Cn are given elements of F’. A solution is just an assignment of values to 
the variables so that the equation holds. 


Proposition 4.7 Given m homogeneous linear equations in n variables, with 
m <n, there is a simultaneous solution to the equations with not all the variables 
equal to zero. 


Proof We prove the result by induction on m. If m = 0, there are no equations, 
and any assignment of values will do. So the induction starts. 

Suppose that the result holds for fewer than m equations. Consider one of 
the given equations, say cy2, +---+CnXp = 0. If all the coefficients cy,..., cp are 
equal to zero, then the equation carries no information and can be discarded, 
reducing the number of equations by one. So suppose that some coefficient is 
non-zero, without loss the coefficient c,,. Now divide this equation by c,, and 
it expresses x, in terms of the other variables. Substitute this expression in the 
remaining equations. We end up with m — 1 equations in n — 1 variables. By 
induction, these have a non-zero solution. So we obtain a non-zero solution to 
the original set of equations. 

The result is proved. 
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Now we return to the properties of linear independence. Let V be a vector 
space, and let Z denote the set of linearly independent finite subsets of V. By 
definition, the empty set is linearly independent. 


Theorem 4.8 (Properties of linear independence) (a) If X © TZ and 
Y CX, thenY ET. 
(b) (Steinitz exchange axiom) Suppose that X,Y €T with |Y| > |X|. Then 
there exists y€ Y \ X such that X U {y} € T. 


Proof (a) If we have a linear combination of a subset Y of X equal to zero, then 
we obtain a linear combination of all of X with the same value, by taking the 
coefficients of the remaining vectors to be zero. Since X is linearly independent, 
all the coefficients must be zero. So Y is linearly independent. 

(b) Suppose that both {21,...,%m} and {y1,...,Yn} are linearly indepen- 
dent, with m <n. Let us also suppose, arguing for a contradiction, that the set 
{x1,...,2m, yi} is linearly dependent, for any i (with 1 <i < n). This means 
that there is a linear combination of 21,...,£%m, yj; which is equal to zero, with not 
all the coefficients zero. The coefficient of y; must be non-zero, since %1,...,2%mp 
are linearly independent. Dividing by this coefficient, and taking y; to the other 
side of the equation, we find that y; is a linear combination of x%1,...,2%m for 
every 7. Suppose that 


Yi = C1 ®1 +++ + CimLm 


for i = 1,...,n. We claim that the vectors y1,...,y, are linearly dependent. 
Could an equation 


ayi +++: + 4Gnyn = 0 


hold? Substituting the expression for the ys in terms of the xs, we find that the 
coefficient of x; is 


a1C1j fteeet anCnj = 0. 


Regarding these as m linear equations for the n unknowns aj,...,@,, we see 
from the proposition that they have a non-zero solution. So y1,..., Yn, are linearly 
dependent. But this contradicts the fact that they are linearly independent! So 
the original assumption, that 71,...,2%m,y; is linearly dependent for all 7 must 
be wrong; so this set is linearly independent for some value of 7, as required. This 
completes the proof. 


From this theorem we can deduce an important property. If there are no bases, 
then we call the vector space infinite-dimensional; it is finite-dimensional 
otherwise. So a space is infinite-dimensional if every linearly independent set can 
be enlarged to a larger linearly independent set. Infinite-dimensional spaces are 
important, but here we are only concerned with the finite-dimensional ones. 
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Theorem 4.9 If V has a basis, then any two bases have the same number of 
elements. 


Proof Suppose that X and Y are both maximal. By the Steinitz exchange 
axiom, if |X| < |Y], then we could obtain a linearly independent set containing 
X by adding an element of Y, which contradicts the maximality of X. Similarly, 
the assumption that |Y| < |X| leads to a contradiction. So |X| = |Y]. 


If the vector space V has a basis (that is, if it is finite-dimensional), then the 
number of elements in any basis is called the dimension of V. 


Theorem 4.10 Two finite-dimensional vector spaces over the same field F are 
isomorphic if and only if they have the same dimension. 


Proof Combine the results of Theorems 4.9 and 4.6. 


4.6 Intersection and sum. Let V bea vector space, and let U,W be sub- 

spaces of V. The intersection UM W is a subspace, and is the largest subspace 

of V contained in both U and W. See Exercise 4.2 which also asks you to show 

that the union UUW is not usually a subspace. Instead, we define the following: 
The sum of the subspaces U and W is the set 


U+W={u+w:uceu,we WH. 
It is a subspace, and is the smallest subspace of V which contains both U and W. 


Proof We apply the Subspace Test. Take two vectors u+ w and u’ + w’ in 
U+W, and two scalars c,c' € F. Then 


cCu+w)+ec(u'+w') = (cut cu’) + (ew4 cw’) eU+W, 


since cu+ cu’ € U and cw+cw’ €W. 

Clearly, both U and W are contained in U + W. (For example, a vector in 
U has the form u +0.) Moreover, any subspace which contains both U and W 
must contain U + W, so it is the smallest such subspace. 


In the finite-dimensional case, the following equation connects the dimensions 
of these spaces: 


Theorem 4.11 JfU and W are subspaces of V, then 
dim(U NW) + dim(U + W) = dim(U) + dim(W). 
Proof Let dim(U) =m, dim(W) = n, and dim(U NW) = k. Choose a basis 


%1,...,%% for UMW. This is a linearly independent set in U, and so it can 
be extended to a basis for U, say 21,...,2%,U1,---,Um—k- Similarly, it can be 


160 Vector spaces 


extended to a basis 71,...,U%, W1,---,Wn—k for W. We claim that all the vectors 
L1,0k,U1,---,Um—k, W1,---,Wn—z form a basis for U + W. Given this, the result 
follows: for we have 
dim(U +W) =k+(m—k)+(n-—k)=m+n-k 
= dim(U) + dim(W) — dim(U nN W). 


Any vector in U + W can be written as a linear combination of the x;, u;, and 
w;; So we have a spanning set. Suppose that we have a linear dependence 


1X1 +++ + Opep + byt + +++ + Om—kUm—k + C1LW1 +++ + Cn—-kWn—k = 0. 
Transposing, we obtain 
Q11 +++ + Aptp + by +++ + bm—kUm—k = —C1W1 — +++ = Cn—kWn—k- 


The left-hand expression is in U, and the right-hand expression in W; so both 
sides lie in UM W. So they can be expressed in a third form, dja, +--+: + dpa. 
Now this and the left-hand side are two expressions for a vector of U in terms of 
the basis of U, so they coincide; that is, a; = d; fori =1,...,k, and b; = 0 for 
i=1,...,m—k. Performing the same argument with the right-hand expression 
gives 0 = d; for i = 1,...,k, and ¢ = 0 for 7 = 1,...,n — k. Combining 
these, we see that all the coefficients are zero. So the xs, us, and ws are linearly 
independent, completing the proof. 


Exercise 4.1 Let V be the vector space of all real-valued functions on the unit interval 
[0, 1]. Show that each of the following is a subspace of V: 


(a) the bounded functions; 

(b) the continuous functions; 

(c) the differentiable functions; 

(d) the functions f satisfying f(0) = f(1). 


Exercise 4.2 Let W and U be subspaces of the vector space V. 


(a) Prove that the intersection WM U is a subspace of V. 
(b) Prove that the union W UU is a subspace if and only if one of W and U contains 
the other. 


Exercise 4.3 Let V be the Euclidean plane, regarded as a real vector space. 

(a) Prove that the set {0}, the whole space V, and any line through the origin, are 
subspaces of V. 

(b) Prove further that every subspace of V is of one of these types. 


Exercise 4.4 Show that the set of all n-tuples of elements of F which satisfy a given 
collection of homogeneous linear equations is a subspace of F”. 
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Exercise 4.5 In V = R’, let U = ((1,2,0,—1), (2,1,1,3), (1, —1,1,2)), and let W = 
((3, 2,0, 2), (2,2,0,1)). Find a basis for UN W. 


Exercise 4.6 (**) The conditions we proved for linearly independent sets in a vector 
space in Theorem 4.8 have been applied much more widely. In the tradition of abstract 
mathematics, they are now considered as axioms. 

A matroid consists of a finite set E and a non-empty family Z of subsets of E, 
satisfying the two conditions 


(Mat1) If X € Zand Y C X, then Y € 7. 
(Mat2) (Exchange axiom) If X,Y € Z and |X| < |Y], then there exists y € Y\ X 
such that X U {y} € TZ. 


Thus, Theorem 4.8 states: A vector space, equipped with the family of its independent 
subsets, is a matroid. This was the original, motivating example. But there are other 
important examples in discrete mathematics. Verify the following. 


(a) A graph consists of a set V of vertices and a set FE of edges, each edge joining 
a pair of vertices. We allow an edge to join a vertex to itself (such an edge is called a 
loop), and we also allow more than one edge to join the same pair of vertices (such 
edges are called parallel). 

A circuit in a graph is a sequence €1, €2,..., €n of edges, such that there are distinct 
vertices U1, U2,.-..,Un so that e; joins v; and vj+41 or i <n while ey» joins vy, and vi. A 
circuit with n = 1 is a loop, and a circuit with n = 2 consists of two parallel edges. 
A set of edges is acyclic if it contains no circuit. 

Prove that, if Z is the set of acyclic sets of edges, then (£7) is a matroid. 


(b) Let E be a set, and let ¥ be a family of subsets of E. We say that a sub- 
set {e1,€2,...,er} of E is a partial transversal for + if there exist distinct sets 
X1,X2,...,X,r in X such that e; € X; fori=1,2,...,r. 

Prove that, if Z is the set of all partial transversals for V, then (£,7Z) is a matroid. 


Remark Ina matroid (E£,T), all maximal members of Z have the same number 
of elements. 


Linear transformations and matrices 


4.7 Linear transformations. In this section, we examine homomorphisms 
of vector spaces. So this corresponds to ‘homomorphisms and ideals’ of rings, or 
‘homomorphisms and normal subgroups’ of groups. There are two differences. 
First, the homomorphisms of vector spaces are almost universally called ‘linear 
transformations’ or ‘linear maps’. Second, in the cases of rings and groups, we 
saw that kernels of homomorphisms are subrings or subgroups that satisfy some 
additional property. Here, by contrast, any subspace can be the kernel of a 
homomorphism. 

Let V and W be vector spaces over the same field Ff. A linear trans- 
formation from V to W is a function 6: V — W which satisfies the following 
conditions: 


(a) For any v1, v2 € V, (v1 + v2)0 = 010 + v28. 
(b) For any v € V, ce F, (cv)0 = c(v9). 
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These two conditions are equivalent to the single condition 
(C101 + CoV2)0 = c1(v19) + Co(v26) 


for v1,v2 € V and ci, c2 € F. 
As you would expect, the image of @ is 


Im(0) = {we W: w =v for some v € V}, 
while its kernel is 
Ker() = {v € VV: vé = O}. 


The image is a subspace of W, whose dimension is called the rank of 0; the 
kernel is a subspace of V, whose dimension is the nullity of 6. 

Let U be a subspace of V. Set v, ~ v2 if vg — vy € U. This is an equivalence 
relation, whose equivalence classes are the cosets of U in V. (This is exactly 
the same usage as for the cosets of a subgroup of an abelian group, where no 
distinction between left and right cosets has to be made. In fact, the cosets in 
the present sense are exactly the cosets of U in the additive group of V.) A coset 
of U has the form 


U+v={utv:ue U}. 


Theorem 4.12 (a) Im(@) is a subspace of W. 
(b) Ker(@) is a subspace of V. 
(c) Two vectors v1, v2 € V satisfy 110 = v20 if and only if they lie in the same 
coset of Ker(@). 


Proof This works in the same way as for groups or rings. 
(a) If wi, wo € Im(6), say wy = 018 and w2 = v2, and c1,c2 € F, then 


CyW, + CoW2 = C1 (V10) + C2(v29) = (C1 v1 + c2V2)0 € Im(A), 


so Im(6) is a subspace. 
(b) If v1, v2 € Ker(@), then v10 = v24 = 0, so 


(C101 + €202)0 = c1 (v1) + €2(v2) = 0, 
SO C11 + C2v2 € Ker(@); so Ker(@) is a subspace. 


(c) 010 = v26 if and only if v2 — v1 € Ker(@), that is, if and only if Ker(@) + 
U= Ker (6) + V2. 


Given a subspace U of a vector space V, we define the factor space V/U 
as follows: its elements are the cosets of U in V, and addition and scalar 
multiplication are defined in the now-familiar way: 


(a) (U+ v1) + (U + v2) =U + (v1, + v2); 
(b) (U+v) =U+cv. 
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Proposition 4.13 If U is a subspace of V, then the factor space V/U is a 
vector space. If V is finite-dimensional, then dim(V/U) = dim(V) — dim(U). 


Now the Isomorphism Theorems hold. I state the first one without proof, and 
leave the others as exercises. 


Theorem 4.14 (First Isomorphism Theorem) Let 0: V — W be a linear 
transformation. Then 


(a) Im(@) is a subspace of W; 
(b) Ker(@) is a subspace of V; 
(c) V/ Ker(6) = Im(6). 


As a corollary, we have: 


Proposition 4.15 (Rank and Nullity Theorem) I[f6:V — W is a linear 
transformation, then dim(Im(6)) + dim(Ker(0)) = dim(V). 


We give another proof of this important result. 


Proof Choose a basis ui,...,uU, for Ker(#), where r = dim(Ker(@)). This is 
a linearly independent set in V, and so can be extended to a basis for V, say 
U1,-+-,Ur,U1,---,Un—r, Where n = dim(V). We claim that v16,...,Un—,@ is a 
basis for Im(0). From this, it follows that dim(Im(0@)) =n —r, and the result is 
proved. 

v10,...,;Un—r0 spans Im(@): choose any vector of Im(@), say v@. Write v in 
terms of the basis for V, say 


v = byuy +--+ + bpp + Cyuy + + Cn pUn—p- 


Now apply 0. Since u;@ = 0 for all i (as these vectors are in the kernel of 6), 
we have 


vO = €1(018) +++ + Ca—r(Un—79), 


as required. 
U18,...,Un—r9 are linearly independent: Suppose that 


€1 (018) +++ + Cy (Un—r 8) = (Cp¥p H+ + Cn—pUn—r)O = 0. 


So cyu1 +++: +Cn—rUn—r © Ker(@). Hence this vector can be written in terms of 
the basis for Ker(6), say 


C11 Het + Cp —rUn—r = bru + +++ + bptr. 
This implies that 


byuy — +++ — bp up + eyvy e+ + Cnr Un—r = 0. 


But the us and vs are linearly independent, since they form a basis for V. So 
b; = c; = 0 for all 7, as required. 
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4.8 Matrices. A matrix is simply a rectangular array, whose entries can 
be anything at all but are most usually numbers (or, more generally, elements 
of a field). Its purpose may be just to record these numbers (which might be 
distances between cities, results in a football league, or transition probabilities 
in a stochastic process); but by far the most important use of matrices is to 
describe linear transformations of a vector space. 

Let U and V be finite-dimensional vector spaces. Choose bases wj,..., Um 
for U, and v1,...,Un for V. Then U and V can be identified with F™ and F” 
respectively, where, for example, the vector (#1,...,%m) € F'™ corresponds to 
©Uy +++: + LmUm € U. Now let S: U — V be a linear transformation. For 
1<i<™m, u;S is a vector in V, so has the form aj;jv, +--+ + ainUn, where 
aj; € F. Then we say that the matrix A = (a,;), the m x n matrix having 
entry a;; in row 7 and column j, represents S relative to the given bases for 
U and V. 

The choice of bases thus has two effects: the vector spaces are identified 
with F™ and F”, and the linear transformation is represented by a matrix. The 
connection goes deeper: 


Proposition 4.16 Let S:U —V be a linear transformation. Choose bases in 
U and V, and let A be the matrix representing S' relative to these bases. Then if U 
and V are identified with F™ and F”, the action of S is given by xt> «A, where 
x € F™ is regarded as a1 xm matriz, and the operation is matrix multiplication. 


Proof We calculate. The m-tuple (x1,...,2%m) corresponds to S7\", xui, 
which is mapped to 


m m n 
) xi(ui 9) = ) ) Lj AjjV;- 
i=1 i=1 j=1 


The coefficient of vu; is as x;a;;, Which is the jth coordinate in the matrix 
product «A. 


Composition of linear transformations corresponds to matrix multiplication: 


Proposition 4.17 Let S:U—4V andT:V —W be linear transformations 
of finite-dimensional vector spaces. Choose bases in the three spaces, and let 
A and B be the matrices representing S and T respectively. Then the matrix 
representing ST is AB. 


Proof Let the basis for U be wy,...,Um, let that for V be v1,...,v,, and let 
that for W be wyj,...,wp. Also, let A = (a;;) and B = (bj). These matrices are 
mx nandn x p respectively, they can be multiplied. 

We have 


n n Pp 
uzST = > ayjujT = x S- aij bjRWe- 
j=l 


j=l k=1 
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So the coefficient of w, is ype ai;b;~, which is precisely the (i,k) entry of AB, 
by the definition of matrix multiplication. 


These results are not a miracle. Once we agree that the most important 
purpose of a matrix is to represent a linear transformation, we naturally want 
to define matrix multiplication so as to reflect the composition of linear trans- 
formation. So we should instead regard these results as justifying the rather 
unintuitive definition of matrix multiplication. Think of the above proposition 
in these terms: ‘I want the matrix product to correspond to composition of 
transformations. What definition should I use?’ 


4.9 Change of basis. The matrix representing a linear transformation is 
defined only relative to bases for the two vector spaces. Change the bases, and 
the matrix will change. We now have to understand how this change works. 
Again, matrices are involved. 

Let V be a finite-dimensional vector space. Let v1,...,Un and v},...,v/, be 
two bases for V. The transition matrix between the two bases is the n x n 
matrix Q = (q;), where vj; = et qi;0;. (The rule is: express the new basis 
vectors in terms of the old ones, and take the matrix of coefficients.) 


Suppose that v/,...,v// is yet another basis, and let Q’ be the transition 


an 
matrix from the primed to the doubly primed basis. Then the transition matrix 
from the unprimed to the doubly primed basis is Q’Q. (Note the reversal!) For 


we have 


n n 


n 
Wn aon ae Loe 
Uu= 5 Wij ¥5 = S S Vij UjkVk- 
j=1 


j=l k=1 


The coefficient of v, is thus et Gj Vik, Which is the (i,k) entry of QQ. 


The transition matrix from a basis to itself is the identity matrix I. Hence 
it follows that, if Q is the transition matrix between two bases of V, then the 
transition matrix between the bases in the reverse direction is Q~!. In particular, 
we see that a transition matrix is invertible (that is, it has an inverse). 

Now we can describe the effect of changes of basis on the matrix of a linear 
transformation. 


Theorem 4.18 Let S:U — V be a linear transformation. Choose two bases 
for U and V; let the transition matrix between the first and second bases in U 
be P, and the transition matrix between the first and second bases in V be Q. 
Suppose that the matrix representing S relative to the first bases in U and V is 
A, and the matrix representing S relative to the second bases is A’. Then 


A’ = PAQ7!. 
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Proof Let wj,...,Um and uj,...,ui, be the two bases for U, and let v1,...,Un 
and v{,...,v;, be the two bases for V. Then A’ = (a/,), where 
n 
uiS = Daisy 
j=l 


Set Qu! == (rij). Then 


m 

/ / 

U;,= ; PijUj, UR = ; TRU: 
j=l 


k=1 
Hence 
m n n 
uo = 3 PijugS = S y pigajxvn = S Dig AjkT RIV] 
j=l k=1 j=1 k=1 l=1 


So aj, = i Yo, PigajeTRL, Or A! = PAR = PAQ™1. 


Now, if S is a linear transformation from U to V, there is a good choice of 
basis which greatly simplifies the form of the matrix. This is the choice that we 
made in the Rank and Nullity Theorem, with a small twist. 


Theorem 4.19 Let S:U — V be a linear transformation. Then there is a 
natural number r and a choice of bases in U and V such that the matrix of S 


relative to these bases is 
I, O 
O O}’ 


where I, is anr xX r identity matriz and O denotes a zero matrix of the appro- 
priate size. 


Proof Choose a basis for Ker(S) and extend to a basis for U. Choose the 
numbering so that, if ui,...,Um is the basis for U, then the last n — r basis 
vectors form a basis for Ker(S). The Rank and Nullity Theorem tells us that, if 
vu; = u,S fori =1,...,r, then vj,...,v, are linearly independent, and so can be 
extended to a basis for V, say v1,...,Un- 

Now we have 


uy, iicr, 
0 otherwise; 


so the matrix representing S relative to this basis is as claimed. 


The matrices P and Q are not unique here; but, whichever ones we choose, 
the number r (called the rank of the matrix A) is the same, since r is the 
dimension of the image of the linear transformation represented by A. 
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Sometimes, two matrices A and A’ of the same size are called ‘equivalent’ 
if they represent the same linear transformation relative to possibly different 
bases. Another way of saying the same thing is that A and A’ are equivalent if 
there exist invertible matrices P and Q such that A’ = PAQ7!. Now ‘equiva- 
lence’ is an equivalence relation; and every equivalence class contains a unique 
I O 
O O 
the number of possible values of r, which is 1 + min(m, 17), since all the values 


matrix of the form ( . So the number of ‘equivalence’ classes is equal to 


5 ) form a set of canonical 
forms for the relation of ‘equivalence’, since each equivalence class contains just 
one of them.) 


0,1,...,min(m,n) can occur. (The matrices ( 


4.10 Elementary operations. Theorem 4.19 describes canonical forms for 
the relation of ‘equivalence’, but gives no hint about how to find r, P,Q for 
a given A. We now consider this problem, and end up giving a different (and 
algorithmic) proof of the theorem. 

We define three types of elementary row operations on a matrix A, as 
follows: 


(Er1) Add a scalar multiple of one row to another. 
(Er2) Multiply a row by a non-zero scalar. 
(Er3) Interchange two rows. 


Note Strictly speaking, the operations of type (Er3) are unnecessary, since 
they can be obtained by a combination of the other types. Given two rows (say 
the ith and jth rows, R; and R;), the sequence of operations: 


e add the ith row to the jth; 
e subtract the jth row from the ith; 
e add the ith row to the jth; 
e multiply the ith row by —1; 


has the effect 
(Ri, Rj) > (Ri, Rj + Ri) (—R;, Rj + Ri) > (—R;, Ri) + (Rj, Ri). 


However, it is convenient to keep all three types. 


2 
For example, consider the matrix ( , a and perform elementary opera- 


tions as follows: 


e Subtract twice the first row from the second: we obtain ( 5) 


e Add three times the second row to the first: we obtain G a) 
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e Multiply the first row by 1/2: we obtain (j s) 
e Multiply the second row by —1: we obtain ({ +) 


You should compare this procedure with the technique for solving linear 
equations. Suppose, for example, that we were given the equations 


2x + 3y = 8, 
4x + 5y = 14. 


We subtract twice the first equation from the second to eliminate x, giving 
—y = —2. Adding three times this equation to the first gives 22 = 2. Now these 
equations imply z = 1, y = 2. 

Not every matrix can be transformed to the identity by performing elemen- 
tary row operations on it. We now investigate just what can be achieved by these 
operations. 


Definition A matrix A is said to be in echelon form if the following two 
conditions hold: 


(Ech1) If a row of A is non-zero, then its first non-zero entry is equal to 1. (This 
entry is called the leading 1 of the row.) 

(Ech2) The non-zero rows occur before all the zero rows; and, if the ith and jth 
rows are both non-zero with z < 7, then the leading 1 in the jth row occurs 
to the right of that in the ith row. 


It is in reduced echelon form if the following condition also holds: 
(Ech3) All other entries in the column containing a leading 1 are zero. 


The term ‘echelon form’ is meant to suggest the way geese fly: each goose 
flies behind and to one side of the one in front (presumably for aerodynamic 
reasons). 

For example, the matrix 


1 2 0 3 4 
001 5 6 
000 0 0 


is in reduced echelon form. If the entry in the first row and third column had 
instead been 7, the matrix would be in echelon but not reduced echelon. 

Note that it follows from (Ech1) and (Ech2) that, if the matrix A is in echelon 
form, then all the entries in the column of a leading 1 which lie below that 
leading 1 must be zero; that is, half of the condition (Ech3) holds automatically. 


Theorem 4.20 Any matrix can be transformed into reduced echelon form by 
a sequence of elementary row operations. 
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Proof The algorithm proceeds in two stages. First, we apply a recursive pro- 
cedure to bring the matrix into echelon form. Then it is a fairly simple matter 
to convert echelon to reduced echelon. In fact, the algorithm is just the familiar 
solution method for systems of linear equations, slightly disguised. 

Let A be a matrix. We define the following procedure. If A = O, then A is 
already in echelon form; so we simply report success. Suppose that A 4 O. Let 
m be the number of the leftmost column which contains a non-zero entry, and 
let k be the number of the topmost row having a non-zero entry in this column. 
First, if k > 1, then interchange the first and kth rows; thus we can assume that 
k =1. Next, if aj 4 1, multiply the first row by (a,,,)~1; thus we can assume 
that a1 = 1. Now, for every i > 1 for which aj, 4 0, subtract aj, times the 
first row from the ith row. This gives us a matrix in which every entry after the 
first in the mth column is zero. 

Let B be the submatrix consisting of all rows except the first. Apply the 
algorithm recursively to reduce B to echelon form. Since the first row of A 
is untouched by these operations, and all elements of A in the other rows 
and the first m columns are zero, the same operations can be applied to A 
without affecting what we have done already. The resulting matrix is in ech- 
elon form, as all the leading 1s in rows after the first occur in columns after 
the mth. 


The description in terms of linear equations is very natural. If all columns 
before the mth are zero, then the first m— 1 variables do not actually appear 
in the equations at all (they will end up as free parameters in the solution). 
We take the first variable x,, which does occur, and divide by its coefficient 
in some equation. This effectively uses that equation to express x, in terms of 
later variables. Subtracting multiples of this equation from the others amounts 
to using the expression for x, to eliminate it from the remaining equations. 
The recursive step says ‘solve these equations for %41,...,2%n’. Then use these 
solutions to find x,,,, and we have finished. 


In Stage 2, we have A in echelon form, and wish to convert it to reduced 
echelon. We work through the non-zero rows from top to bottom. Consider the 
ith row, and suppose that its leading 1 appears in the m,th column. As we 
remarked earlier, all elements of this column below the ith row are already zero. 
If an earlier row (say, the jth) has a non-zero element in the m;th column, we 
remove it by subtracting ajm, times the ith row from the jth row. Since all 
entries in the ith row earlier than the m,th column are zero, earlier entries in 
the jth column are unaffected, although later ones may change. 

This corresponds to massaging the solution we already found for the linear 
equations. As we saw, columns containing leading 1s correspond to vari- 
ables which can be expressed in terms of later variables; all other variables 
appear as independent parameters in the solution. Now converting to reduced 
echelon ensures that, in the solution, each variable is expressed in terms of 
the free parameters only; no further substitutions are required to read off 
the values. 
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All this is easier to understand in terms of a worked example. We take 
a matrix and apply the algorithm to it, following our progress with the 
corresponding set of linear equations. 


Let 
012 3 4 5 
A= {02 4 7 9 12 
0002 5 4 


The corresponding system of equations reads 


©2 +203 + 344+ 405 + 5x6 = 0, 


2x%29 +443 + 7r4 4+ 9x5 4+ 12%6 = 0, 


2x44 + 5x5 + 4a%6 = 0. 


(For simplicity, we have taken the right-hand sides of the equations to be zero.) 

The first non-zero elements occur in the second column. The first row already 
has entry 1 in this column, so we do not need to swap or divide; this entry will 
be a leading 1. Subtracting twice the first row from the second, we obtain 


0 12 3 4 5 
00 011 2 
0002 5 4 
In terms of equations, we have expressed x2 = —2x%3 — 344 — 4%5 — 5x6, and 


substituted in the second equation to obtain 74 + 75 + 2%g = 0. The third 
equation is unaltered. 
Now we consider, recursively, the matrix 


6 ONOsT a. 9B 
Belo os a) 


and the corresponding equations 


t4+%5 +226 = 0, 


224 + 5x45 +426 = 0. 


Again, no swap or division is required, and subtracting twice the first row from 
the second makes the new second row (0 00030). With the equations, we have 
written x4 = —X5 — 2x6 and substituted to obtain 325 = 0. 

At the last step we have C = (000030). Dividing by 3 gives (000010), 
which is in echelon form. We have simply deduced that x; = 0. So we have solved 
the equations in terms of the free parameters 71,273,%g; our matrix in echelon 
form is 


ooo 
OS 
ae \) 
oe nS) 
ee 
ow uo 
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Now, in Stage 2, we remove the 3 above the leading 1 in the second row by 
subtracting three times the second row from the first, giving a new first row 
(0 1201 —1); and the 1s above the leading 1 in the third row by subtracting 
the third row from each of the first and second rows. This gives the matrix 


0 -1 
0 2 
1 0 


oCc°e 


1 
0 
0 


in reduced echelon. The corresponding solution of the equations is 


2 = —273 + X6; 
v4 = —2%6, 
v5 = 0, 


where £1, £3, %@ are arbitrary. 


There is another way we can view this theorem. For each elementary row 
operation, there is a corresponding elementary matrix obtained by performing 
that operation on the identity matrix. For example, if there are two rows, then the 
operation ‘add twice the first row to the second’ corresponds to the elementary 


tri 1 0 
matrix [5 4 }: 


Proposition 4.21 (a) Let E be the elementary matrix corresponding to a 
given elementary row operation T. Then the matrix obtained by performing the 
operation T on A is BA. 

(b) The result of applying any sequence 71,72,...,T, of elementary row 
operations to a matrix A is E,-+-++ BEA. 


Proof (a) This is proved by multiplying the matrices to check, for each type 

of row operation. For example, if 7 is the operation of adding X times the ith 

row to the jth, then EF is the matrix with diagonal entries equal to 1, (j,7) entry 

A, and all other entries zero. Then the (k,/) entry of EA is equal to Aay + aj1 

if k = j, and is ag, otherwise; that is, EA is the matrix obtained from A by the 

operation 7. The argument in the other cases is similar. 
(b) This is now obvious. 


Note that any elementary matrix is invertible, since any elementary row 
operation can be ‘undone’ by another operation of the same type. 


Theorem 4.22 (a) Any invertible matrix is a product of elementary matrices. 
(b) For any matrix A, there is an invertible matric P such that PA is in 
reduced echelon form. 


Proof Let A be an invertible n x n matrix. By the previous result, there exist 
elementary matrices £), F2,...,F, so that B = E,--- EA is in reduced echelon 
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form. Now B must be the identity matrix. For B is invertible, so no row of B 
can be zero. Let the leading 1 in the 7th row of B occur in column m;. Then 
1<m, <m2<...< mn <n. So we have m; = i for every i. Now, since every 
column contains a leading 1, all the elements apart from the leading 1s are zero. 
So B has entries 1 on the diagonal and 0 elsewhere; that is, B = I. 

Now it follows that A = E,'E;'--- E71 is a product of elementary matrices. 

(b) This follows immediately from part (b) of the Proposition together with 
Theorem 4.20. 


In order to complete the analysis, we define elementary column opera- 
tions in a similar way to the elementary row operations: 


(Ecl) Add a scalar multiple of one column to another. 
(Ec2) Multiply a column by a non-zero scalar. 
(Ec3) Interchange two columns. 


Suppose that the matrix A is in reduced echelon form. Let A = (a;;) have 
its leading 1s in row 7 and column m,, for 1 <i <r, with my < m2 <---< mp. 
Apart from the leading 1s, any non-zero element a;; occurs ina row i <r anda 
column j > m, which is not of the form m, for any k. Now column m, has entry 
1 in the 7th row and zeros everywhere else; so if we subtract a;; times column m; 
from column j, we replace a;; by zero and do not change any other element of 
the matrix. Thus, by applying a number of operations of type (Ecl), we produce 
a matrix in which all the elements apart from the leading Is are zero. 

Now use operations of type (Ec3) to swap columns 1 and m1, 2 and mg, ..., 
r and m, (if necessary). The resulting matrix has its leading 1s in position (¢, 7) 


for 7 = 1,...,r. In other words, it has the form ), where the identity 


IO 
O O 
matrix I has size r x r. This is the canonical form of A under ‘equivalence’. 

Furthermore, just as elementary row operations can be performed by multi- 
plying on the left by elementary matrices, so elementary column operations can 
be performed by multiplying on the right (except that it is necessary to trans- 
pose the elementary matrices in order to have the correct effect). As before, a 
product of elementary matrices is invertible. 

So we have given an algorithmic proof of Theorem 4.19. 


Our algorithm for converting a matrix into the canonical form for equivalence 
was based on the idea of performing first the row operations required to convert 
it to reduced echelon, and then finishing with comparatively simple column oper- 
ations. This is efficient for many purposes. For example, to calculate the rank of 
a matrix, apply row operations until it is in echelon form (it is not necessary to 
continue to reduced echelon), and count the number of non-zero rows. 

However, if we do not insist on the strict separation between row and column 
operations, it is possible to reduce a matrix more simply, as we now see. The 
advantage of this simpler method is that it applies (with suitable modification) 
in a more general situation, as we see in the next section. 
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Algorithm for canonical form under equivalence We are given an mx n 
matrix A. 

If A = O, we are finished. 

Otherwise, suppose that a;; is a non-zero element. By swapping the first row 
with the ith (if i > 1) and the first column with the jth (if 7 > 1), we may 
assume that i= 7 = 1. 

Multiplying the first row by Qi," we may assume that ai1 = 1. 

Now subtracting a; times the first row from the ith (for i > 1) and subtract- 
ing a1; times the first column from the jth (for 7 > 1), we may assume that all 
entries in the first row and column other than aj, are zero. 

Let B be the matrix obtained by deleting the first row and column of A. 
Recursively reduce B to canonical form. Then re-insert the first row and column 
to find the canonical form of A. 


4.11 Determinants. For small matrices, it is very common to define 
the determinant by writing down the formula and deducing the interesting 
properties. Thus, for example, 


fA 6 ) ghende(A) Soa he 


if A= 


, then det(A) = aei + bf g + cdh — afh — bdi — ceg. 


6 QA8 
Tree 
eR O 


For larger matrices, the formulae become too cumbersome. (We will see that the 
formula for the determinant of an n x n matrix has n! terms.) So we give an 
axiomatic treatment instead. 

We define ‘a’ determinant function to be a function satisfying three axioms. 
We prove that there is a unique such function. After this is done, we are justified 
in referring to ‘the’ determinant. 


Definition A determinant function on n x n matrices over a field F is a 
function det : M,,(F) > F satisfying the following axioms: 
(D1) det(A) is a linear function of the ith row of A (keeping the other rows 
constant), for each 7. 
(D2) If two rows of A are equal, then det(A) = 0. 
(D3) det(Z) = 1, where I is the identity matrix. 


Condition (D1) means the following. If A, A’, A” are three matrices which 
have the same entries in all rows except the ith, and if their ith rows are 
respectively R;, Ri, RY, where Ri’ = cR; + CR, for some c,c € F, then 
det(A”) = cdet(A) + c’ det(A’). For example, 


e+3y 2e+4y\ _ il, <2 3°04 
act ( 5 6 ) = reer (5 5) tude (3 a 
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Theorem 4.23 There is a unique determinant on M,(F), for any positive 
integer n and field F. 


Proof The key to this theorem is to study the effects of the three types of 
elementary row operation on the determinant. 
Let A be any n x n matrix over F. 


(ED1): If B is obtained from A by adding c times the ith row to the jth, then 
det(B) = det(A). For let C be the matrix obtained from A by deleting the 
jth row and substituting the ith. Then det(C’) = 0 by (D2), since C has two 
equal rows. By (D1), 


det(B) = det(A) + cdet(C) = det(A). 


(ED2): If B is obtained from A by multiplying the ith row by c, then det(B) = 
cdet(A). This is immediate from (D1). 

(ED3): If B is obtained from A by interchanging two rows, then det(B) = 
—det(A). For we noted, after the three types of elementary operation were 
defined, that the operation (Er3) (interchanging two rows) can be realised 
by composing three operations of type (Erl) (which do not change the deter- 
minant, by point (ED1)) with one of type (Er2), namely multiplying a row 
by —1 (which multiplies the determinant by —1, by point (ED2)). 


Now there is a sequence of elementary row operations which converts A to 
reduced echelon form (say B). Thus we have 


E, ++ BoE, A= B, 


where F,...,/, are elementary matrices. Since A is square, either B has a 
zero row, or B is the identity I. In the first case, det(B) = 0 by (D1); in the 
second, det(B) = 1, by (D3). Moreover, each elementary operation multiplies 
det(A) by a factor which depends only on the operation applied. So, as a result, 
det(B) = c,-+++c;, det(A), where ci,...,c, are the factors associated with the 
operations. Thus det(A) is determined uniquely. 


We have shown that, if there exists a determinant function, then it is unique. 
It remains to show that there really is such a function. This is somewhat 
technical, and you may want to skip the next argument at first reading. 

Recall that the sign of a permutation g € S, is equal to (—1)?, where p, 
the parity of g, is equal to the parity of n minus the number of cycles of g. We 
write it as sign(g). Now consider the following function D(A) of n x n matrices 
A= (aiz): 


D(A) = S- sign(g)d11g4229---Onng: 
gESn 


a sum of n! terms, each a product of n factors aj; and a sign. We claim that the 
function D satisfies (D1)—(D3). 
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(D1): Each term in the sum involves just one entry from the ith row, namely 
@iig, and so is a linear function of the 7th row. The same is true of the sum. 

(D2): Suppose that the ith and jth rows are equal, that is, aj, = aj, for all 
k. Let t be the transposition (i 7). Then H = {1,t} is a subgroup of Sp, 
and so S, can be partitioned into n!/2 right cosets of H. Consider the two 
terms in D(A) corresponding to the elements g, tg of a coset Hg. The factors 
taken from rows other than the ith and jth are the same. From the ith and 
jth rows, we take aj ig; jg and 4; itg4; tg. But these are equal, since it = J, 
jt = 1, and az = ajx. Moreover, g and tg differ by a transposition, so have 
opposite signs. Thus, the two terms cancel. Since this holds for each coset, 
D(A) = 0. 

(D3): For A = I, the only non-zero entries are the diagonal entries a;;, and so 
the only non-zero term in the expression for D(A) is the one coming from the 
identity permutation. Since a;;,; = 1 for all 2, and the identity has sign 1, the 
value of D(J) is 1. 


In order to prove the important properties of determinants, we look once 
more at the elementary row operations. 

We associated with each such operation 7 a factor c(r), by which the deter- 
minant of A is multiplied when 7 is applied to A. We also have an elementary 
matrix E(r), obtained by applying 7 to the identity matrix. 


Proposition 4.24 c(r) = det(E(r)). 


Proof This is clear from the fact that det(Z) = 1. 


Theorem 4.25 (a) For any A,B € M,(F), we have det(AB) = det(A) 
det(B). 
(b) det(A) 4 0 if and only if A is invertible. 


Proof There is an invertible matrix P (a product of elementary matrices) such 
that PA is in reduced echelon form; thus, either PA has a zero row (if A is 
not invertible), or PA = I (if A is invertible). Moreover, if P = E,--- FE, then 
c(E,) +--+ c(£1) det(A) = det(PA). 

If A is not invertible, then det(PA) = 0, and so det(A) = 0. If A is invert- 
ible, then PA = J, so det(PA) = 1. Thus c(E,.)---c(E,) det(A) = 1, and so 
det(A) 4 0. This proves (b). 

If A is not invertible, then neither is AB, and so det(AB) = 0 = Odet(B) = 
det(A) det(B). Suppose that A is invertible. Then, as above, 


c(E,) +++ c(£1) det(A) = 1. 


Now the same sequence of elementary operations, applied to AB, yields the 
equation F,---£,AB = PAB=IJB= B, and so 


c(E,) +++ c(E1) det(AB) = det(B). 
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It follows from these two equations that 


det(A)~' det(AB) = c(E,) +++ c(E,) det(AB) = det(B), 


so that det(AB) = det(A) det(B), proving (a). 


Theorem 4.26 det(A) =det(A'), where A! is the transpose of the matrix A. 
Proof A typical term in the formula for det(A') is 
sign(g)d19 12g 2°** Ang n- 


Now, if A is the inverse of g, then sign(h) = sign(g) (since g and h have the same 
cycle structure); so we can re-order the factors and write this term as 


sign(h)a1 1n@2 2h°** Gn nh- 


As o ranges over the symmetric group, so does its inverse; so we obtain all the 
terms in det(A), and conclude that det(A') = det(A). 


Determinants over rings It is possible to define determinants of matrices 
whose entries are taken from any commutative ring. There are several ways to 
do this. One of these is an axiomatic approach similar to the one we used before. 
It is easier to use what we already know. 

If R is an integral domain, then it can be embedded in a field F (its field of 
fractions, see Section 2.14). Now any matrix over R can be regarded as a matrix 
over F’, and its determinant calculated as before. One advantage of this approach 
is that our previous results will apply: for example, the axioms (D1)—(D3) 
all hold. 

If R is an arbitrary commutative ring, we can define the determinant of an 
n Xn matrix over R using the formula we worked out in Theorem 4.23. This 
time, however, we cannot assume that earlier results apply, but must rework the 
proofs. As an example, if R is a zero ring, then the determinant of any n x n 
matrix over R is zero for n > 1. 

One final method can be applied when the ring R contains a subring K which 
is a field. We consider the matrix X = (a;;) whose entries are independent inde- 
terminates over K (that is, elements of the polynomial ring S = K[a11,..-,%nn]). 
Now S is an integral domain, so we can compute the determinant of X in the 
field of fractions of S. Now we obtain the determinant of an arbitrary matrix 
over R by substituting elements of R for the indeterminates. (This substitution 
is a ring homomorphism from S to R.) 


Some special determinants The general formula for a determinant is very 
cumbersome even for moderate n. It is important that there are various spe- 
cial matrices whose determinants can be evaluated more simply by ‘product 
formulae’. In particular, it is easy to decide whether or not such determinants 
are zero. 

We consider two types here: Vandermonde determinants and circulants. 
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Definition A Vandermonde determinant is one of the form 


1 1 Kits 1 
ay a2 an 
2 2 2 
V(a1,42,..-,@n) =det | aj O53: xe a8 
n-1 n-l1 n—-1 
ay 0) an 


for a1, d9,...,@n € F. 


Proposition 4.27 V(a1,a9,...,d@n) = II (aj; — aj). 
1<i<j<n 
In particular, the determinant is zero if and only if aj = a; for some iF j. 


Proof First, we evaluate V(a1,22,...,%n), where 21,2%2,...,@, are indeter- 
minates over F' (this is calculated in the field of fractions of F'[x1,%2,...,n]). 
Once the formula of the proposition is established in this situation, it is just a 
polynomial identity, and remains true if we substitute a; for 7;,7=1,...,n. 

Let us then regard V as a polynomial f(z,,) in x, with coefficients in the 
field of fractions of F[a,,...,%p—1]. Under the substitution x, = x;, for i <n, 
the matrix has two equal columns, and so its determinant is zero. So (a, — 2) 
is a factor of f(a). By inspection, f has degree n—1, so x1,...,%n—1 are all its 
roots. Thus 


V(a1,..-,%n) = W(a1,...,%n-1) [[@ — 2%), 
i<n 
where W does not depend on 2z,. Repeating this procedure for the other 
variables, we see that 


V(a1,.--,2n) = ZY (a; — 2), 


where Z does not depend on any of the variables; that is, Z € F. 

To evaluate Z, we consider the term in x73 ---a”?~! in the expansion of V. 
In the formula for the determinant as a sum over permutations, this term comes 
only from the identity permutation, and so its coefficient is +1. In the product 
form, we must take z,, from all n—1 factors containing it, then x,_; from all n—2 
factors which contain it but not x,, and so on. In other words, from each factor, 
we take the variable with the greater suffix, which has the positive sign. So the 
coefficient is +Z. We conclude that Z = 1, and the proposition is proved. 


Definition A circulant C(ao,a1,...,@n—1) is a determinant of the form 
ao Gy +++ An—-2  An-1 
C(ao, Aly... ,Gn—1) = det Giese 0 oe ie a ; 
ay An—-1 ao 


where ag,...,@n—1 € F. 
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Proposition 4.28 Suppose that F contains a primitive nth root of unity, say 
w. Then 


n-1 
C(ao, Q1,+++, Gn—1) = [[ Go F ayw' AR esac: ae): 
i=0 
Proof Consider the vector vj = (1 wt w7*4 ... wo ("-Y)*), (We number the 


coordinates from 0 to n — 1, and where necessary we take them mod n.) Let A 
be the n x n matrix whose determinant is C. Also, let 


Ay = ag + aw" fees f ayia”), 


Then 


(v; A); =a;t ane Sy ae Gg): 
But this is equal to A;w~’, in other words, A; times the jth entry of v;. Thus 
viA = NjV;- 

The vectors v9,U1,---,;Un—1 form a basis for F”. (There are n of them, and 
they are linearly independent, since the determinant of the matrix with columns 
vg ,U],+++;U,_1 is the Vandermonde determinant V(1,w~!,...,w7(-)); and 
the elements 1,w7!,...,w7("~) are all distinct, since they are all the powers of a 
primitive nth root of unity.) So, if P is the matrix whose rows are U1, 01,.--,Un—1; 
then PAP™~! is the matrix representing the same linear transformation as A 
relative to the new basis. This matrix is diagonal, and has as its diagonal entries 


Xo, A1,+++;An—1- Hence 


det(A) = det(PAP7! -T Ms 


as required. 


4.12 Matrices over Euclidean domains. We are going to prove a ‘canoni- 
cal form theorem’ for matrices over a Euclidean domain. (In fact, the same result 
holds more generally, over a principal ideal domain; but the proof needs an extra 
trick, and is not given here.) This theorem appears to be of no particular use. 
But we will see that, by applying it to the two most important examples of 
Euclidean domains (the ring of integers, and the polynomial ring over a field), 
we obtain two unexpected bonuses: a structure theorem for finitely generated 
abelian groups, and a canonical form for the matrix of a linear transformation 
from a vector space to itself. 

We start with a quick revision course on Euclidean domains. See Section 2.13 
for more details. A Euclidean domain is an integral domain R (a commutative 
ring with identity and no divisors of zero) having a Euclidean function d from 
the set of non-zero elements of R to the set of non-negative integers, satisfying 
the following conditions: 
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(a) if a and 6 are non-zero, then d(ab) > d(a); 
(b) if b 4 0 and a is arbitrary, then there exist q,r € R with a = bqg+r and 
either r = 0 or d(r) < d(b). 


A Euclidean domain is a principal ideal domain (each ideal is generated 
by a single element), and hence a unique factorisation domain. Moreover, the 
Euclidean Algorithm finds, for any two elements a and b, the greatest common 
divisor (g.c.d.) d of a and b, and elements x and y such that d= xa + yb. 


Let R be any commutative ring with identity. We define elementary row 
operations on matrices over R almost exactly as we did over a field. The only 
difference is that, in order to ensure that any elementary row operation can be 
undone, we restrict operation (Er2) by allowing only multiplication by units. In 
detail, the operations are: 


(Er1) Add any multiple of one row to another. 
(Er2) Multiply a row by a unit of R. 
(Er3) Interchange two rows. 


We also define the analogous elementary column operations (Ecl), (Ec2), 
and (Ec3). 


Theorem 4.29 Let A be anm xn matrix over a Euclidean domain R. Then 
A can be transformed, by means of elementary row and column operations, to a 


matrix of the form @ oy) where D is anr x r matrix with diagonal entries 


d,,d2,...,d, and zeros elsewhere, and O is a zero matriz of any appropriate 
size. Moreover, d; #0 forl1<i<r, and d; divides dj,, for1<i<r-—1. The 
number r is uniquely determined by A, and the elements d,,...,d, are unique 
up to associates. 


This theorem generalises Theorem 4.19, since if R is a field then any non-zero 
element of R is associate to 1. 
The ring elements d,d2,...,d, are called the invariant factors of the 


matrix A. The matrix & o) is called the Smith normal form of A. 

To prove that the required form exists, it is enough to prove the following: 
By performing elementary row and column operations, we can 
convert A into a matrix B such that the element d, = b;; divides 
every element in B and the remaining elements in the first row 
and column are all zero. 


_ {dy O 
e=(6 
where O is a 1 x (n— 1) row or a (m— 1) x 1 column of zeros, and every element 
of C is divisible by d,. By induction, C' can be converted to Smith normal form 


For then B has the form 
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by elementary row operations. Suppose that its invariant factors are d2,...,d,, 
where d; divides d;,, for 2 < i1<r—1. The elementary operations applied to 
C' do not change the property that all of its elements are divisible by d,; so d, 
divides dp, as required. 

We prove that, if not every element of A is divisible by a,,, then we can apply 
elementary operations to find a matrix A’ with d(a{,) < d(ai1). 


Case 1 Suppose that some element of the first row is not divisible by a1, say 
a,j, for 7 > 1. By the Euclidean property, we can write a1; = a11qg +r, where 
r # 0 (since a1; does not divide a;;) and hence d(r) < d(a11). Now subtract q 
times the first column from the jth column. The new entry in the jth column is 
a4; — qa, = r. Now interchange the first and jth columns, to obtain a matrix 
with (1,1) entry r. 


Case 2 If some element of the first column is not divisible by aj,, the argument 
is similar, but using row operations instead of column operations. 


Case 3 Finally, suppose that a ,, divides every element in the first row and 
column, but does not divide the entry a;;. Suppose that aj; = vay1. Subtracting 
x —1 times the first row from the ith, we obtain a matrix with (i,1) entry equal 
to ai1. The new (i, 7) entry is aj; — (w — 1)a1; = bj;, say. Since a, divides aj; 
but not aj;, it does not divide b;;. Write b;; = qai1 +r, where d(r) < d(ai1). 
Now, subtracting qg times the first column from the jth, we obtain a matrix with 
(i,7) entry bj; — qa11 =r. Finally, a row interchange and a column interchange 
bring r to the (1,1) position. 


Now the value of d(ai1) can only decrease a finite number of times, since it 
is a non-negative integer. So, after a finite number of steps, we reach a matrix 
all of whose elements are divisible by aj. 


Finally, if aj; = q;@1; for 7 > 1, then subtracting gq; times the first col- 
umn from the jth produces a matrix with (1,7) entry zero. Similarly, by row 
operations, we may assume that all entries a;; are zero for 7 > 1. 


This completes the existence proof. What about the uniqueness of the 
elements d; (up to associates)? 


In the first place, d; is determined: it is the greatest common divisor of all 
the elements in A. For it is easily checked that the elementary operations do 
not alter the greatest common divisor of all the entries up to associates (that 
is, at worst the g.c.d. is multiplied by a unit). For the final matrix, the greatest 
common divisor is d;, since this is an entry and it divides all the others. 

The other elementary divisors are determined by a somewhat more compli- 
cated rule: 


For 1 <i <r, the greatest common divisor of all the determinants 
of i x 7 submatrices of A is equal to djd2---d;; for i > r, this 
greatest common divisor is zero. 
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The proof is similar: one shows that the greatest common divisor of the i x 7 
submatrices is multiplied by a unit (hence not changed, up to associates) by 
elementary operations, while the greatest common divisor for a matrix in the 
Smith normal form is d,d2---d; ifi <r, or 0 ifi>r. 

So the theorem is proved. 


; over the integers. The greatest com- 
mon divisor of the entries is 2, while the determinant is —8; so we expect the 


Example Consider the matrix 


0 4 
First, 4 does not divide 6; indeed, 6 = 4+ 2. So subtract the first column from 


Smith normal form to be (; ) . Let us see what operations achieve this result. 


the second, to obtain 


obtaining ¢ :) : 


: a) Now interchange the first and second columns, 


Now 2 does divide all the other entries, so we subtract twice the first column 


from the second, giving G a) , and then subtract the first row from the second, 


giving & 1) 
0 4)" 


The final result is in Smith normal form. 


Exercise 4.7 (a) Prove the First Isomorphism Theorem for vector spaces. 
(b) Formulate and prove the Second and Third Isomorphism Theorems. 


Exercise 4.8 Let A = (aij) be an n x n matrix over the field F’. Define the (2,7) 
cofactor A;; of A to be (—1)'*? times the determinant of the matrix obtained from A 
by deleting the ith row and jth column. 


(a) Prove that )7""_, aijAij = det(A). [Hint: Show that the left-hand side satisfies 
the axioms for a determinant.] 

(b) Prove that, for i #k, 0", aij Axj = 0. [Hint: By (a), this is the determinant 
of a matrix with two equal rows.] 

(c) Define the adjoint of A to be the matrix Adj(A) whose (7, 7) entry is Aj;. Prove 
that A-Adj(A) = det(A)I. 


Exercise 4.9 Put the integer matrix 


6 10 0O 
6 O 15 
0 10 15 


into Smith normal form. Check your result by calculating determinants. 


Exercise 4.10 Show that the reduced echelon form of a matrix over a field is unique. 


5 Modules 


A module bears the same relationship to a vector space as a ring does to a field. 
As this suggests, modules exist in much greater profusion than do vector spaces. 
We know everything about a vector space over a given field F' once we know its 
dimension; but to specify a module, more detailed information is required. One 
of the themes of modern algebra is that the modules for a given ring capture 
a good deal of the structure of the ring. Our main goal in this chapter is the 
description of modules over Euclidean domains. The structure theorem tells us 
about finitely generated abelian groups, and also gives us canonical forms for 
matrices over fields. 


Introduction 


5.1 Definition of modules. A module is a ‘vector space over a ring’. That 
is, it satisfies almost exactly the same axioms as a vector space, with scalars 
taken from a ring rather than a field. There are two small differences, resulting 
from the extra generality. First, since a ring does not necessarily have an identity 
element, we cannot impose the axiom involving the identity. Second, since ring 
multiplication may not be commutative, there are two forms of the axiom involv- 
ing multiplication of scalars, leading to two different kinds of module. There is 
also a notational difference: sometimes we choose to write scalars on the right, 
instead of on the left as we did for vector spaces. 

Formally, then, let R be a ring. We define a right R-module to be a set M 
with a binary operation of addition (written +) and, for each r € R, a unary 
operation of scalar multiplication by r (where we write the result of multiplying 
m by ras mr), satisfying the following axioms: 


) For all m1,mz2 € M, we have m; + m2 € M. 
) For all m,,m2,m3 € M, we have m, + (mz + m3) = (mM, + m2) + m3. 
os There exists 0 € M such that m+0=0+m=™m for allme M. 
A3) For all m € M, there exists —m € M such that m+ (—m) = 
=m) +m. =0. 
MA4) For all mi,m2 € M, mj, +m2=m2+m). 
0) For all r € Rand me M, we have mr € M. 
1) For all r € R, m1, m2 € M, we have (m, + mg2)r = mypr + mar. 
M2) For all r1,r2 € R, m € M, we have m(r1 +72) = mri + mre. 
M3) For all r1,r2 € R, m € M, we have m(rir2) = (mry)ro. 
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If the ring R has an identity 1, then we call M a unital module if it also 
satisfies 


(MM4) For all m € M we have ml =m. 


Axiom (MMB) says that scalar multiplication by rir2 is the same as multipli- 
cation by first r; and then rz. We could imagine structures in which this works 
the other way around, that is, first r2, then r;. These are called left R-modules. 
It would be possible simply to write an alternative version of (MM3) for left mod- 
ules. But it is more natural to change the notation, so that scalar multiplication 
by r takes m to rm (with the scalar on the left). Formally, then, we define a 
left R-module to be a set M with a binary operation of addition and a unary 
operation of scalar multiplication by r for each r € R (written rm) such that 
axioms (MA0)—(MA4) above hold, and also 


( ) 

(MM1’) For all r € R, m,,mz € M, we have r(m, + m2) = rm, + rmy. 
(MM2’) For all r1,r2 € R, m € M, we have (ri + r2)mM = rim+rem. 
(MM3’) 


Note that (MM0’)—(MM2’) are identical to (MM0)—(MM2), but written in 
different notation; only (MM3’) is really different. 

Again, if R has an identity 1, the left R-module is called unital if lm =m 
for all me M. 

Just as for a vector space, we can express the axioms more briefly in abstract 
language. Let M be a right R-module. Axioms (MAO0)—-(MA4) assert that M, 
with the operation of addition, is an abelian group. The map 6, : M — M given 
by mé, = mr is an endomorphism of the abelian group M (a homomorphism 
from M to itself). The set of all endomorphisms forms a ring End(M), and the 
map ¢: R — End(M) given by ré = 6, is a homomorphism. So we could 
say: A right R-module is an abelian group M with a homomorphism from R 
to End(M). Moreover, it is a unital module if and only if the homomorphism ¢ 
maps the identity of R to the identity of End(M). 

For left modules, there is a complication. Our definition of the endomorphism 
ring of an abelian group takes multiplication of endomorphisms to be composition 
in the usual order. It is necessary to reverse this order. Accordingly, given any 
ring R, we define the opposite ring R° as follows: the elements of R° are the 
elements of R and addition is the same as in R; but multiplication (which we 
will denote by 0) is given by the rule 


Ty, OT2 = 191. 


It can be shown that R° is a ring, and shares most of the properties of R. Now 
we can say: A left R-module consists of an abelian group M together with a 
homomorphism ~ from R° to End(M); it is unital if and only if W maps the 
identity of R° to the identity endomorphism. In other words, a left R-module is 
a right R°-module. 
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Of course, all this confusing complication about left and right modules is 
unnecessary if the ring R is commutative. In this case, we simply speak of 
R-modules, without specifying left or right. This explains why we did not meet 
the left-right distinction when we were studying vector spaces. 


5.2 Examples of modules. 


Example 1 If F is a field, a vector space over F' is an F-module (left or right), 
and conversely. Modules generalise vector spaces. 


Example 2. Let R be any ring. Then we can make R into either a left or a right 
R-module, as follows. In either case, we take the module addition to be the ring 
addition. Also, in either case, we take the scalar multiplication to be the ring 
multiplication, but interpreted differently: for a right module, rir is the result 
of multiplying the module element r, by the ring element rz; for a left module, 
rz is the module element, and r; the scalar. 

If R has an identity, then these modules are automatically unital. 

We call these modules the free right and left R-modules of rank 1. 


Example 3 This generalises example 2. Let n be a positive integer, and let R” 
denote the set of n-tuples of elements of R. We make R” into a right R-module 
by defining 


(11, @2,---,2n) + (Yi, Y2,---5 Yn) = (@1 + Yi, 2 + Y2,---5Ln + Yn); 


(@1,%2,.--,Un)r = (L171, Var,...,2nP), 
or a left module by defining 


(@1,%2,---,Un) + (Yi, Y2,---5 Yn) = (@1 + Yi, 22 + Y2,---5 Ln + Yn), 


r(@1,@2,..-,€n) = (r%1,17X2,...,7Xn). 


These are the free right and left R-modules of rank n. 

Theorem 4.10 shows that any finite-dimensional vector space is a free module 
of rank n for a unique value of n. For general rings, things are much more 
complicated: most modules are not free, and free modules of different ranks may 
be isomorphic. 


Example 4 Any module is an abelian group, if we forget the scalar multipli- 
cation and consider just the addition. Can we go back from abelian groups to 
modules? It turns out that any abelian group can be made into a Z-module ina 
natural way. 

Let M be an abelian group (whose operation is written as +). Now define a 
scalar multiplication by Z as follows: 


ifn >Oand a € M, then nx =x+a2+---+ a (n terms); 
Ow = 0, where the second zero is the identity element of M; 
if n = —m where m > 0, then nz = —(ma). 
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So we have the general principle: Abelian groups are the same thing as 
Z-modules. 


Example 5 Let M = F”, where F is a field. Then M is a vector space over 
F,, and hence a (left) F-module. But it is also a right M,,(f)-module, where 
M,(F) is the ring of n x n matrices over F’. (‘Scalar multiplication’ of a vector 
by a matrix is given by the usual formula for matrix multiplication, regarding 
the vector as a 1 x n matrix.) 

A structure M like this, which is a left module for R and a right module for 
S and satisfies the additional axiom 

(MB) For all r € R, s€ S and me M, we have (rm)s = r(ms), 

is called an R-S' bimodule. 


Example 6 This is an important example for applications. 

Let V be a vector space over F’, and let S$ be a linear transformation from V 
to V. For any polynomial f € Fx], we can define a linear transformation f (5) 
on V as follows: if 


f(a) = ana” + Qn 1) +-+»+ a2 +49, 
then we put 
f(S) =a,5" + an—1S" + ++>+a15 + aol, 


where S” is the n-fold composition of S with itself (vS" = (---(vS)S---)S, 
where S occurs n times), and J is the identity transformation. 
Now we make V into a F[z]-module by the rule 


We will see that the structure of this module reflects the properties of S in a 
very precise way. 


Example 7 Here is a further generalisation of the free R-modules of rank 1 
from Example 2. If J is an ideal in the ring R, then both J and R/I are (right or 
left) R-modules. For I, we take the addition and multiplication of R, but only 
add two elements of J, or multiply an element of J by an element of R. (Both of 
these operations yield results in I, by the definition of an ideal.) 

For R/I, the construction is similar. We use the addition defined in the 
factor ring, and scalar multiplication (J + «)r = I+ ar (for a right module) or 
r(I[+a) = I+rz (for a left module). It is necessary to check that these operations 
are well defined, as well as proving the module axioms. 


5.3. Submodules and homomorphisms. We formulate the notions of sub- 
module and module homomorphism for right modules. The definitions for left 
modules are very similar. 
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Definition Let M be a right R-module. A submodule N of & is a subset 
of M which is an R-module (with respect to the addition and scalar multipli- 
cation of M). 


As usual, in order to test whether N is a submodule, it suffices to check the 
closure laws, since all other laws hold automatically. Accordingly we have 


Theorem 5.1 (Submodule Test) The non-empty subset N of M is a sub- 
module if and only if it is closed under subtraction and scalar multiplication. 


Proof The closure conditions are clearly necessary. Suppose that they hold. 
Closure under subtraction ensures that N is a subgroup of the abelian group 
(M,+). So the result follows. 


Let M and N be right R-modules. A (R-module) homomorphism from 1 
to N isamap 0: M — N which preserves addition and scalar multiplication: 


(m4 + m2)0 = m0 + m20, 
(mr)@ = (mé)r, 


for all m,m ,m2 € M and r € R. (The second equation indicates that the 
homomorphism does not affect the scalars in R.) 

An isomorphism is a homomorphism which is one-to-one and onto. 

The image Im(6) of a homomorphism 6 is {m@: m € M}, and the kernel 
Ker(@) is {m € M: mé = 0}. 

If K is a submodule of M, then we can define a factor module M/K, 
whose elements are the cosets of K in M. (These cosets are defined because K is 
a subgroup of the abelian group (M,+). For the same reason, addition on M/K 
is well defined.) Now scalar multiplication on M/K is given by the rule 


(K+ m)r= K+ mr. 
As usual, we can check that this is well defined, and that M/K is an R-module. 


Theorem 5.2. Let@: M — N be an R-module homomorphism. Then the image 
and kernel of 6 are submodules of N and M respectively; and M/Ker(@) & Im(0) 
(as R-modules). 


The proof is an exercise. 


5.4 Annihilators, cyclic modules, direct sums. This section develops a 
few tools of module theory, which will be applied in the next. We assume from 
now on that our rings are commutative and have identities, and that our modules 
(for which the left-right distinction is not necessary) are unital. 


Definition Let R be a commutative ring with identity, and M a unital 
R-module. The annihilator of M, written Ann(M), is the set 


{ré€ R: mr =0 for all me M}. 
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The submodule of M generated by m1 ,...,7™, is the set 
(m41,..-;Mr) = {mri tees + Marn: 11,---;Tn € Rh. 


We say that M is finitely generated if there is a finite set of elements of 
M which generates M. We say that M is cyclic if it is generated by just one 
element. 


Remark In the case where the module M is equal to R (as in Example 1), 
the submodule generated by a set of elements is exactly the same as the ideal 
generated by these elements; this is why we use the same notation. 


Here is a general result about these concepts. 


Theorem 5.3 Let M be a unital module over a commutative ring R with 
identity. 


(a) Ann(M) is an ideal of R. 
(b) For any my,...,Mn € M, the set (m1,...,™Mn) is a submodule of M, and 
is the smallest submodule containing m1,...,™Mn. 


(c) If M is cyclic, then M = R/ Ann(M) (as R-module). 


Proof (a) We apply the Ideal Test. 
If r1,r2 € Ann(M), then mr, = mr2 = 0, so m(r1 +12) = mr1 + mr = 0, and 
ry +ro € Ann(M). 
If r € Ann(M) and s € R, then mr = 0, so m(rs) = (mr)s = 0, so that 
rs € Ann(M). 
Thus Ann(//) is an ideal of R. 
(b) We apply the Submodule Test: 
If myry te +s + Marna, M181 +++ + MySn € (M1,..., My), then the sum of these 
elements is 


Mir ett FMpnln +M181 +++ + MnSn 


= mi(r1 + 81) t+ + mal(Tn + Sn) € (M1,-.-, Mn). 


If muri +--+ + Marn € (m1,...,™Mn) and s € R, then 
(mint tees + Marn)s = mi(r1s) +--+ + mar(rns) € (mM1,...,™Mn). 
So (m,,...,Mn) is a submodule. 


Moreover, this submodule contains 


for i =1,...,n. (Here we used the unital property of M7.) Any submodule N of 
M which contains m,,...,7™p, contains all multiples m;r;, and hence all linear 
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combinations mir, +---+Mnrn, of these elements. So (m4,..., Mn), as defined, 
is the smallest submodule containing these elements. 

(c) Let M = (m) be a cyclic R-module with annihilator J. We define a map 
6: M— R/T by the rule 


(mr)O9=I+r. 


This will be the required isomorphism. First, we show that it is well defined and 
one-to-one. Note that r € J if and only if mr = 0. For, if r € J, then mr = 0, 
since I is the annihilator of M and m € M. Conversely, suppose that mr = 0. 
Take any element of M/; by assumption, it has the form maz for some x € R. 
Then (mx)r = m(ar) = m(rx) = (mr)a =0. So r € Ann(M) = I. 
Now 
mry = mre & m(r1 — r2) = 0 
Sry—7T2€E1 
olT+ r= r+ T2, 


so @ is well defined and one-to-one. 
Clearly, @ is onto. Finally, 


(mr1)0 + (mr2)6 = (+11) + (2 +12) = I+ (ri +12) = (m(r1 + r2))8, 
and 
((mr)0)s = (IL+r)s =I+rs = (mrs)d, 
so 6 is a module isomorphism. 


Remark Conversely, if J is an ideal of R, then R/I is a cyclic R-module, 
generated by the coset [+ 1: for any element of R/I has the form [+r = (I+1)r 
for some r € R. 


Definition Let M = (m,...,m,) be a finitely generated R-module. We say 
that M is freely generated by mj,...,™m,y if 


myry t++++Myryn = 0 implies ry =...=7, = 0. 
M is free if it is freely generated by some finite set. 


Theorem 5.4 Let M be a R-module, where R is a commutative ring with 
identity. 


(a) M is freely generated by m1,...,™n if and only if every element of M can 
be uniquely expressed in the form myry +++: + Mnn, for r1,..-,Tr € RK. 
(b) M is free if and only if it is isomorphic to R” for some natural number n. 
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Proof (a) The condition that every element of M has an expression of this 
form is just the statement that M = (m4,...,™m™,). The uniqueness of the rep- 
resentation is equivalent to the definition of freeness, since mir1 +---+Mnrn = 
ms, +++++mnSpy if and only if mi(r1 — $1) +--+ + mMn(Tn — Sn) = 0. 

(b) Suppose that M is freely generated by m1,...,7™ny, so that every element 
of M can be written uniquely in the form mir +--+ + Mnrn. Then the map 
0: M — R” given by 


(mint t-++ + Marn)O = (71, 72,---,Tn) 


is easily checked to be an R-module isomorphism. 
Conversely, R” is freely generated by e1,...,en, where e; is the n-tuple with 
1 in the ith position and 0 in all other positions, since 


(T1,---5Tn) = errr t+ + entn- 


Definition Let M and N be R-modules. The direct sum M@ WN of M and 
N is the set of all ordered pairs (m,n), with m € M and n € N, with addition 
given by 


(m1,71) + (ma, n2) = (M1 + Me2,7n1 + 72), 
and scalar multiplication by 
(m,n)r = (mr, nr). 
Proposition 5.5 Jf M and N are R-modules, then M @ N is an R-module. 
The proof is an exercise. 


The direct sum of modules can be extended to the sum of any finite number 
of terms, in an obvious way. The next result enables us to recognise a direct sum. 


Theorem 5.6 Let M be an R-module, where R is a commutative ring with 
identity. Suppose that M contains submodules My, Mo2,...,Mn such that any 
element of M can be uniquely written asm, +m2+---+mMn, with m; € M; for 
i=1,...,n. Then M is isomorphic to the direct sum of My,...,Mn. 


Proof We define a map 0: MM, ©---®@ M, — M by the rule 
(m1, M2,...,Mn)O =m, + me +++» +My. 


The hypothesis of the theorem guarantees that this mapping is one-to-one and 
onto, and it is easily checked that it is a homomorphism. 


Remark The free module R” is isomorphic to the direct sum of n copies of 
the free module R of rank 1. 
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Exercise 5.1 Show that the set of all m x n matrices over F is a Mn(F)—M,(F) 
bimodule (with the usual matrix addition and multiplication). 


Exercise 5.2 Let M be an R-module, and let J = Ann(M). Show that M can be 
regarded as an R/I-module, where scalar multiplication is given by the rule 


mI +r) =mr. 


Exercise 5.3 Let R be a commutative ring with identity. 


(a) Prove that, if MM is an R-module generated by a single element, then M ~ 
R/ Ann(M) (where R/ Ann(M) is an R-module as in Example 7). 

(b) Conversely, show that, if J is an ideal of R, then R/I (as R-module) is generated 
by a single element. 

(c) Show that, if J and J are ideals of R, then there is an R-module homomorphism 
from R/I onto R/J if and only if IC J. 


Remark Here we see the ideal structure of R reflected in the 1-generator 
R-modules and their epimorphisms. 


Modules over a Euclidean domain 


5.5 The structure theorem. In general, modules can exist in enormous 
profusion. In order to understand a ring, we should study its modules. In this 
section, we examine finitely generated modules over Euclidean rings, and prove 
a structure theorem. 


Theorem 5.7 A finitely generated module over a Euclidean domain is isomor- 
phic to a direct sum of cyclic modules. 


This gives a very precise description of the structure of such modules. The 
direct sum is an explicit construction. Any cyclic R-module is isomorphic to R/I 
for some ideal J. If R is a Euclidean domain, then all its ideals are principal, so 
I =(r) for some r € R, unique up to associate. 

The theorem also has important applications, as we will see. 

The theorem will be deduced from another, seemingly unrelated theorem, 
which gives more detailed information. 


Theorem 5.8 Let R be a Euclidean domain, M a free module of rank n, and 
N a submodule of M. Then there exist elements m1,...,™mn € M, a natural 
number r <n, and elements d,,...,d, € R such that 


(a) M is freely generated by m1,...,™n; 
(b) N is freely generated by mid,...,m,d,; 
(c) d; divides diz, fori=1,...,r—1. 


Proof We will need to show that N is finitely generated. Then we will find that 
the rest of the work has already been done. So we begin with the assumption 
that N is finitely generated, and return later to justify this assumption. 
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We identify M with R”, where n is its rank. Let 71,...,2m be generators 
of N. Each of these generators is an element of R”, so we can summarise this 
information by taking them as the rows of a matrix A, of size m x n, with 
elements in R. This matrix determines N; indeed, in the terminology of vector 
spaces, N is the ‘row space’ of A. 

According to the Smith normal form Theorem 4.29, we can bring A to the 


form , where D is a diagonal matrix of size r x r whose diagonal entries 


DO 
O O 
satisfy d; | dj, for i=1,...,7r, and O denotes a zero matrix of the appropriate 
size. What would the submodule corresponding to a matrix of this form look 


like? If e; denotes the element of R” with 1 in the 7th position and zeros else- 


where, then M is the free module generated by e1,...,e€n, and N a submodule 
freely generated by d,e1,...,d,-e,. In other words, the conclusions of the theorem 
would hold. 


So we have to examine how the effect of elementary row and column 
operations on the matrix A translates into the module M and submodule N. 


e Elementary row operations merely change the generating set for N. For 
suppose the rows of A are 71,...,2%m as before. For each type of row oper- 
ation, the submodule closure conditions imply that the new rows (such as 
x; +2,;c or zc with c a unit) belong to N. So the submodule N’ generated 
by the new rows is contained in N. However, in each case, we can undo the 
effect of the operation by another operation of the same kind; so a similar 
argument shows that N is contained in N’, whence N = N’. 

e Elementary column operations change the free basis for M. We will prove 
this for operations of type (Ecl); the argument for the other types is similar 
but easier. So, as above, let €1,...,€n be the free basis for MM. Consider what 
happens when we replace e; by e; — e;c, for some c € R. It is still true that 
any element of W/Z can be expressed uniquely in terms of these elements: for 
example, 


ery +e $entn = ert +++ +Ex(Ti +750) 


+-+++ (ej — ee)rj +--+ + entn. 


So we do have again a basis for M. In terms of coordinates, we have added c 
times the jth coordinate to the ith. If this process is done for every generator 
of N (every row of A), the result is to apply to A the column operation 
consisting of adding c times the jth column to the ith. In other words, the 
rows of the transformed matrix are the generators of N, expressed with 
respect to a different basis for M. 


The conclusion is that elementary operations do not change the module M or 
the submodule N, but merely our representation of them. Hence we may indeed 
assume that N is the row space of a matrix in the Smith normal form, and the 
theorem is proved. 


192 Modules 


It remains to show that N is indeed finitely generated, so that it can be 
represented as the row space of a matrix. The proof is by induction on n, there 
being nothing to prove when n = 0. 

Let J be the set of all elements of R which occur as first component of an 
element of N: 


J={r, €R:Are,...,1rn € R with (r1,...,17n) € N}. 


Since N is a submodule, it is closed under subtraction and multiplication by 
elements of R. It follows that the same is true for J, which is thus an ideal of R. 
Since R is a principal ideal domain, we have J = (a) for some a € R. 

Choose b2,...,bn € R such that (a, be,...,bn) € R. Also, let 


N, = {(11,---,1!) € Ni ry = Of. 
Now take any element (r1,...,7n) € N. We have r; = as for some s € R. Then 
(11,---;1n) — (a, b2,...,bn)s © M1. 


Hence N is generated by (a, b2,...,bn) together with a generating set for N1. 
But N, is a submodule of R"~!, hence finitely generated (by induction); and so 
N is finitely generated. 

This concludes the proof of the theorem. 

Let M be an arbitrary finitely generated R-module. Suppose that M = 
(m1,...,™n). Define a map 6: R" > M by the rule 


(71, 72,---57Tn)O = mir + Mere +++ + Maln- 


It is straightforward to show that # is an R-module homomorphism, and that 
Im(0) = M. Let N = Ker(@). 

By the submodule theorem, we can choose a basis €1, €2,...,€n for R” such 
that, for some d;,...,d, € R, 


e e;d,,...,e,d, is a basis for N; 
e d; divides dis for i= Ts. ip lo 1. 


We claim that M is isomorphic to the direct sum of the cyclic submodules 
(e108), ...,(€n0), and that the annihilator of (e;0) is the ideal (d;) of R (so that 
(e;0) & R/(d;). 

Using the characterisation of direct sums in Theorem 5.6, we have to show 
that every element of M is uniquely expressible in the form 71 + %2 +---4+2n, 
with x; € (e;0) for all 7. That any element can be so represented follows from 
the fact that €),...,€n is a basis for R”. Suppose that 


Trt +hn = Yr t+ + Yn, 


with x;,y; € (e;0). Let x; = e,0r;, and y; = e;0s;. Then >> e;(r; — s;) € Ker(@) = 
N. By the structure of N, we conclude that d; divides (r; — s;) for all i, so that 
ei(r; — 8) € Ker(0), and so e;6r; = e;6s; for all 7, as required. 
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We have done better than originally promised. What we have actually proved 
is the following theorem: 


Theorem 5.9 Let M be a finitely generated module over a Euclidean domain. 
Then there exist elements dy,...,dy, € R such that 


(a) none of the d; are units; 

(b) d; divides diz, fori =1,...,r—1; 
(c) dj =0 fori>r; 

(d) M& R/(d\) @---@ R/(dp). 


Note If d; is a unit, then (d;) = R, and so R/(d;) = {0}. Taking the direct 
sum of a module with {0} does not change the module. So we can delete all the 
units from among the d;. (By the divisibility condition, they will occur only at 
the start.) 

Note also that R/{0} = R, so the number of direct summands isomorphic to 
Risn-r. 


Definition The number of indices 7 such that d; = 0 is called the torsion- 
free rank of the module M, and the non-zero ring elements d),...,d, are the 
invariant factors. They form a complete set of invariants for the module. Thus, 
two finitely generated R-modules are isomorphic if and only if they have the same 
torsion-free rank and the same invariant factors (up to associates). The meaning 
of the term ‘invariant factors’ should be fairly clear; that of ‘torsion-free rank’ 
somewhat more mysterious. Some light will be shed on this strange terminology 
in the section on abelian groups. 


5.6 The primary decomposition. A module can often be written in many 
different ways as a direct sum of submodules. In this section, we discuss a par- 
ticular decomposition for torsion modules over a principal ideal domain, which 
will lead to a simpler canonical form for matrices. 


Proposition 5.10 Let R be a principal ideal domain. Let M be an R-module 
for which Ann(M) = (r) is non-zero. Suppose that r = rirg, where r, and rg 
are coprime. Then M = M, ® Mog, where M, and M2 are submodules of M with 
Ann(M1) = (71) and Ann(M2) = (ro). 


For example, the cyclic group Cg of order 6 is a Zmodule with annihilator 
(6); and we saw that Cs & C2 @ C3, where the subgroups C2 and C3 have 
annihilators (2) and (3) respectively. 


Proof Let M; = {m € M: mr = O}, and similarly Mz = {m © M : 
mrz = 0}. Then M; and Mz are submodules: 


mnéeM >mr=nr =0> (mtn)r1 =0> mine M, 


méeM,rEeR>mr,=05 (mr)rn =mrnr=0>mreM,, 


and similarly for Mo. 
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Now Ann(M;) is an ideal of R, and hence is of the form (s,) for some 51. 
By definition, r1 € Ann(M;), so s; divides 71, say 7; = sia. Now take any 
element m € M. Then 0 = mr = (mro2)r1; so, by definition, mr € M;. Then 
mr28, = 0. Since m was arbitrary, $172 € Ann(M) = (rire), so rirg divides syro, 
whence r; divides s;. Since each of r; and s, divides the other, these elements 
are associates, and so Ann(M;) = (r1). Similarly, Ann(M2) = (r2). 

It remains to show that M = M,@ Mg; equivalently, that any element m € M 
can be represented uniquely in the form m = m; + mz with m; € M, and 
m2 € Mo. 

First, we show that such a representation exists. Since R is a principal ideal 
domain, and (7r1,r2) = (1), there exist x,y € R such that wr; + yro = 1. Then 


m=ml =myro+mzr,; 


and (myr2)r1 = (mx)r = 0, so myre € M4, and similarly mar, € M4. 

For the uniqueness, suppose that m; + m2 = mi +m}, where m,,m{, € My 
and m2,ms € Mz. Then m, — m, = m4 — mg, so this element lies in both Mj 
and Mp. So it is enough to prove that M, MN Mz = {0}. Take any element m 
which lies in this intersection. Then mr, = mrg = 0. So 


m=ml =m(arr, 4+ yre) = (mri)x + (mre)y = 0, 


as required. This completes the proof. 


Reading the proof carefully, we see that there is an alternative description 
of the submodules: My = Mrz = {mrg:m € M}, and Mz = Mri = {mr : 
me M}. 


Theorem 5.11 (Primary decomposition) Let M be a module over a prin- 
cipal ideal domain R. Let Ann(M) = (r), and suppose that r = py'ps?-+- pp", 
where p1,...,Dk are irreducible and n1,...,Nz are positive integers. Then 


M=M,0:---®Mgz, 
where M,...,M, are submodules of M and Ann(M;) = (p;"). 


This follows immediately by induction from the previous result. Note that 
there is no choice at all about the submodules M;; for M; consists of all elements 
m € M such that mp;' = 0. The M; are called the primary components 
of M. 


Proposition 5.12 Let M be a finitely generated torsion module over a 
Euclidean domain. Then M is isomorphic to a direct sum of cyclic submodules 
whose annihilators are powers of irreducibles. 


Proof First we apply the primary decomposition to M, expressing it as a 
direct sum of submodules whose annihilators are powers of irreducibles. These 
submodules are finitely generated. (More generally, if M = (m1,...,mx), then 
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Mr = (mjr,...,mxr).) So each of these submodules is a direct sum of cyclic 
submodules, the annihilator of each of which is a power of an irreducible. 

We could alternatively first decompose WM as a direct sum of cyclic modules, 
and then decompose each of these cyclic submodules according to the primary 
decomposition. (Taking k = 1 in the parenthetical remark above, we see that 
if M is cyclic, then so is Mr for any r € R, and in particular the primary 
constituents of M are cyclic.) 


The generators of the annihilators of these cyclic submodules are called 
the elementary divisors of M. The elementary divisors are just the fac- 
tors obtained when the invariant factors of M are factorised into powers of 
irreducibles. Like the invariant factors, they are unique: 


Theorem 5.13 Suppose that M is expressed in two different ways as a direct 
sum of cyclic submodules whose annihilators are powers of irreducibles. Then the 
annihilators are the same, up to associates. 


Proof It is enough to prove this when Ann(M) is a prime power. (This follows 
because any cyclic submodule with prime power annihilator is contained in one 
of the primary components, and these are uniquely determined.) So assume that 
Ann(M) = (p"), where p is irreducible. 

Let M = M, @---@ Mg, where M; is a cyclic module with annihilator (p™), 
where 1 <n; <n. 

Now (p) is a maximal ideal of R, and hence R/(p) is a field. Now consider 
the submodule Mp of M, and let N be the factor module M/Mp. We claim that 
M/Mp is a vector space over R/(p). The main point is that scalar multiplication 
is well defined: if (p) + r = (p) + s, then s = r+ px and so mr and ms differ 
by an element of Mp. Now the cosets containing the generators m,...,mx of 
the cyclic summands form a basis for this vector space; so dim(M/Mp) = k 
(as R/(p)-vector space). 


Now suppose that n; > 1 for i < ky, while ng,41 =... = nx = 1. Then 
Mp is the direct sum of cyclic modules generated by mjp,...,™mx,p. (We have 
Mk, +1p =... = Mp = 0, so these elements generate trivial submodules which 


can be ignored.) So dim(Mp/Mp”) = ki (as R/(p)-vector space). 
Continuing in this way, we find that the dimension of Mp?/Mp/*? is equal 
to the number of n1,...,m, which are greater than j. 
But, given a set of positive integers n1,...,nx, if we are told how many of 
them are greater than j for each 7 > 0, then we can recover the numbers n;. 


For example, if the dimensions of the spaces M/Mp, Mp/Mp?, Mp?/Mp?, 
Mp? /Mp*, Mp*/Mp’ are respectively 7,3,3,1,0, then there are seven numbers 
n;, and they are 1,1,1,1,3,3,4. 


Remark The elementary divisors are obtained from the invariant factors by 
factorising each of them into prime powers and taking all the prime powers 
obtained. Conversely, suppose that we are given the elementary divisors. Take the 
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largest power of each irreducible which occurs, and multiply them all together; 
this is the largest invariant factor. Now remove these elementary divisors and 
repeat the procedure. 


For example, if the elementary divisors are 2,4,4,3,9,5, then the invariant 
factors (in the reverse of the usual order) are 4-9-5 = 180, 4-3 = 12, and 2. 


Applications 


5.7 Finitely generated abelian groups. We have seen that an abelian 
group is exactly the same thing as a Z-module. Since Z is our prototype of a 
Euclidean domain, we can immediately apply our structure theorem to obtain 
the structure of finitely generated abelian groups: 


Theorem 5.14 Let A be a finitely generated abelian group. Then 
AZO 3607, OCs 62 GCE, 


where d,,...,d, are positive integers with d; | di41 fori=1,...,r—1, Ca is a 
cyclic group of finite order d, and Cx is an infinite cyclic group. 


The uniqueness part of the module structure theorem gives us extra 
information. 


Theorem 5.15 Suppose that 


Cis SO CLO CEOs OOS™ Cy CoC OCS FOC e, 
where di,e; > 1 for all i,j, dj | digi fori = 1,...,r—1 and e; | ej41 for 
j=1,...,s—1. Let the numbers of Ca. summands of the two groups be u and 
v respectively. Thenu=v,r=s, andd; =e; fori=1,...,r. 


Note that the theorem is false without the divisibility condition. For example, 
C20 C3 = Ce. 

In the abelian group Cy, ®--: ®@ Ca, PCa O:+: OP Coo, we call the sum 
Ca, ®:::®Ca, of the finite cyclic groups the torsion part, and the sum of the 
infinite cyclic groups the torsion-free part; the number of infinite summands 
is the torsion-free rank of the group. Note that the torsion part consists of all 
the elements of finite order. 

Where does this terminology come from? The answer lies in the field of 
‘algebraic topology’, and it would take another book to explain it in detail; 
what follows is only a rough sketch. One of the central problems of topology 
is how to distinguish between different topological spaces (surfaces, etc.), or to 
decide whether two quite different recipes give the same or different spaces. (The 
kind of recipe that we are thinking of here can be described by example. If we 
take a rectangular strip of paper, bend it round, and join the ends, we obtain 
a cylinder. If we give the end a 180° twist before joining the ends, we obtain 
instead a Mobius band.) 
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Topologists discovered that it is possible to associate a collection of abelian 
groups with a space, so that if two spaces are the same, then the groups associated 
with them are isomorphic. These groups can be calculated from a description of 
the space (as above). So, if we calculate the groups from the descriptions of two 
spaces, and the groups turn out not to be isomorphic, then the spaces are really 
different. (The converse is false; different spaces may give the same groups.) 

Now the presence of elements of finite order (other than the identity) in the 
group indicates some ‘twisting’ of the space (as in the Mébius band). Hence 
the name torsion elements was used for elements of finite order in an abelian 
group, and the group was called torsion-free if it has no elements of finite order 
except the identity. 

If we regard an abelian group as a Z-module, then a torsion element is one 
whose annihilator is not the zero ideal. Hence, as in the last section, we can 
generalise to modules over an arbitrary commutative ring: a torsion element 
of such a module is one whose annihilator is not the zero ideal, and a module is 
torsion-free if 0 is its only torsion element. 

Theorems 5.14 and 5.15 enable us to count abelian groups. The result is given 
in terms of a famous number-theoretic function. 


Definition The partition function p(n) is the function whose value on the 
positive integer n is the number of ways of writing n as the sum of positive 
integers, where the order of the summands is unimportant. 


For example, p(4) = 5, because 


4=3+1=2+2=2+1+1=1+4+141+41. 


(We do not count 3+ 1 and 1 + 3 separately.) 
By convention, p(0) = 1. 


Proposition 5.16 Let fa(n) be the number of abelian groups of order n. 
(a) Ifm=p"---p*, where pi,...,Pr are distinct primes, then 
fa(n) = fa(pt") +++ falpp*)- 
(b) If p is prime, then fa(p™) = p(m) (the partition function of m). 


Proof (a) By the primary decomposition, an abelian group of order n is 
the direct sum of abelian groups of orders pj"’,...,p;’"; it determines, and is 
determined by, the choices of these groups. 

(b) To each expression n = a, + a2 +---+a, corresponds a group, the direct 
sum of cyclic groups of orders p*!,p%?,...,p%". These groups are all different, 
and every abelian group of order p” is isomorphic to one of them. 


For example, the number of abelian groups of order 108 = 273° is p(2)p(3) = 
2-3=6. The groups are given in the following table, which lists the forms given 


198 Modules 


by the invariant factors as well as the elementary divisors: 


C4 ® Coz = Cros, 
Co @ Co @ Coz = Co @ Cra, 
C4 8 C3 B Co = C3 G Ce, 
C2 @ C2 OB C3 @ Co = Ce © Cig, 
Cy @ C3 6 C3 @ C3 = C3 C3 @ Cia, 
C2 8 C2 8 C3 OB C3 C3 = C3 BCo GO Ce. 


It can be shown, from the above result, that the number of abelian groups of 
order n is not greater than n (Exercise 5.4). 


5.8 Normal forms of matrices. In this section, we tackle a problem which 
appears similar to the canonical form under equivalence (see Section 4.9). There, 
we began with a linear transformation 6 : U — V, and remarked that any 
choice of bases in U and V gives rise to a matrix representing 6, and matrices 
A, B representing the same transformation relative to different bases are related 
by B = PAQ7!. Moreover, there is a choice of bases such that the matrix 
representing @ has the simple form G : 
r is the rank of 6 (the dimension of its image), so there is unique matrix of this 
form which represents 0. 

The situation we consider here is that @ is a linear transformation from a 
vector space V to itself. We represent @ by choosing a basis {v1,...,Un} for V, 
and letting vj = > ai;v;; then @ is represented by the matrix A = (a;;). The 
difference is that, instead of choosing two different bases in the source and target 
spaces, we have only the freedom to choose one basis. If we use a different basis, 
with transition matrix P, then the new matrix B representing @ is given by 
B= PAP. (Because there is only one basis to change, we must have P = Q 
in the earlier formalism.) Can we find a set of ‘simple’ matrices so that each 
linear transformation from V to itself can be represented by one of them? Since 
our freedom to transform is less, there will be more matrices in such a set. (We 
are seeking canonical forms for a ‘finer’ equivalence relation than before.) 

We solve this problem by using 6 to make V into a module for the polynomial 
ring Fx], such that the module captures the structure of 0. As in Example 6 of 
Section 5.2, we make V an F[z]-module by setting 


) The submatrix I is r x r, where 


where powers of @ are calculated by composition. This module is finitely 
generated, since a basis for it as F-vector space certainly generates it as a module. 

We will apply the structure theorems for modules over Euclidean domains 
(since Fa] is certainly a Euclidean domain). First, we have to see what cyclic 
modules look like. 
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Definition Let f(x) be a monic polynomial in F'[z2'. 
f(a) = 2" + dy_ya"* +++ +012 + a0. 


The companion matrix C(f) is the n x n matrix given by 


0 1 0 0 0 
0 0 1 0 0 
C(f) = hs 
0 0 0 0 1 
—ago —a, —aQ.) ... TAn-2 —An-1 


(In other words, in the first n — 1 rows, the entries immediately to the right of 
the diagonal are 1, and all others zero; the last row consists of the coefficients of 
f (excluding the coefficient of x”), in reverse order and with the sign changed.) 
Furthermore, if m is a positive integer, let the extended companion matrix 
C(m, f) be the matrix with m x m blocks, each n x n, given by 


Cc J O O O 
O C J O O 
Ca dys Pee Pi Se Oa, 
O O O ... CGC J 
O O O ... O C 


with C on the diagonal, J immediately to the right, and zeros elsewhere, where 
C = C(f) is the companion matrix of f, and J is a matrix with 1 in the south- 
west corner (row n and column 1) and zero elsewhere. Note that C(f) = C(1, f). 


Proposition 5.17 Let 0 be a linear transformation on V, and suppose that the 
corresponding F'|a]-module is cyclic, with annihilator (g™). Then there is a basis 
for V relative to which the matrix of 0 is C(m,q). 


Proof Suppose that 6: V — V is a linear transformation, and V is a cyclic 


F|x|-module, say V = vF [az]. We claim that V has a basis v1,...,Un,, where 
v; = vo! for i = 1,...,n. It is clear that the vectors v,v6,v0?,... span V. 
Choose n minimal such that v,v6,...,v0” are linearly dependent. Then it must 
be the case that the coefficient of v@” in such a linear combination is non-zero; 
so v0” can be expressed as a linear combination of v,v6,...,v0"—!. Hence the 
span of these vectors is mapped to itself by 6, and hence is the whole of V 
(since V = vF[6]). By minimality of n, the vectors v,v0,...,v0"~! are linearly 
independent; so they form a basis for V. 

Set v; = v0*! for i=1,...,n. Then v,6 = vj41 for i=1,...,n—1. Suppose 
that 


Un = —anv1 — +++ — An—1Un 
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for some ao,...,@n,—-1 € F. Then the matrix representing 6 relative to this basis 
is the companion matrix of 


1 


f(a) =2" + ania") +++ +9. 


In addition, vf(@) = 0. It follows that, for any polynomial h, we have 


Since V = vF a], we see that Ann(V) = (f). So we have now proved the 
proposition for m = 1. 

We showed above that {v6*~! : 1 < i < n} is a basis for the cyclic module V, 
where n = dim(V). It is easy to see that, if f;-1(a) is any polynomial of degree 
i—1, for 1 <i <n, then the set {vf;_1(@) :1 <7 <n} is also a basis. 

Suppose that Ann(V) = (f) where f = g™, and g is a polynomial of degree 
k (with km = n). Then x*~'g(x)!~! is a polynomial of degree (i — 1) + k(j — 1). 
Taking 1 <i<kand1<j<™m, these degrees take all values from 0 to km —1, 
so we obtain an alternative basis for V, namely the vectors w;; = v0'~1g(0)J~1. 

What is the matrix representing @ relative to this basis? We have w;;9 = 
Witlj fori< k, and, if 


g(x) = a + by_ya*®-1 +... + dp, 
then 
Wri = w1;0" = —bow1; ees be-1Wkj ae Wi j+1) 


with the convention that wim41 = 0. Thus the matrix of 6 is exactly C(m, gq), 
as claimed. 


Theorem 5.18 (Normal forms of matrices) Let 6: V — V be a linear 
transformation, and regard V as an F[x|-module in the usual way. 


(a) If the invariant factors of the module are di(x),...,d,(x), then there is 
a basis for V relative to which @ is represented by a block diagonal matrix with 
C(d,),...,C(d,) on the diagonal and zeros elsewhere. 


(b) If the elementary divisors of the module are e,(x)"™,...,e,(x)™*, then 
there is a basis for V relative to which @ is represented by a block diagonal matria 
with C(m1,e1),...,C(mx, ex) on the diagonal and zeros elsewhere. 


Matrices of the shape described in (a) or (b) of this theorem are said to be 
in rational canonical form or primary rational canonical form respec- 
tively. (Note that, in the rational canonical form we require that d; divides dj41 
for i = 1,...,r — 1, while in the primary rational canonical form we require 
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that each e; is irreducible.) So we can restate the theorem in matrix form 
as follows: 


Theorem 5.19 Let A be an xn matriz over F. 


(a) There is an invertible n x n matrix P such that PAP~ is in rational 
canonical form. 


(b) There is an invertible n x n matrix Q such that QAQ7! is in primary 
rational canonical form. 

Moreover, the rational and primary rational canonical forms of a given matrix 
are unique (up to the order of the diagonal blocks in the primary rational case). 


There is one particularly important case of this theorem. Suppose that 
the field F is algebraically closed. (Traditionally, this analysis is given for the 
field C.) Then any non-constant polynomial has a root in F’, and hence a linear 
factor. So the only irreducible polynomials are those of the form «—a fora € F. 
Now C(a — a) is the 1 x 1 matrix (a). Hence C(m,a — a) has the form 


a 1 O ... O 0 
0 a I> eee 0 0 
0 0 0 ... a 1 
0 0 0 .... O a 


Such a matrix is called a Jordan block. So the primary rational form 
immediately gives the following result. 


Theorem 5.20 (Jordan form) Let A bean xn matrix over an algebraically 
closed field F. Then there is an invertible n x n matriz Q over F' such that 
QAQ7! is a block diagonal matrix with Jordan blocks on the diagonal and zeros 
elsewhere. 


5.9 The Cayley—Hamilton Theorem. Let A be an n x n matrix over a 
field F. Then the n? +1 matrices A° = I, A! = A, A?,...,A” lie in the n?- 
dimensional vector space M,,(F'), and so they are linearly dependent. Hence A 
satisfies a polynomial equation of degree at most n?. 

The Cayley-Hamilton Theorem shows that there is a specific polynomial 
equation of degree n which is satisfied by A, the so-called characteristic equation 
of A. 

The minimal polynomial of A is the monic polynomial of least degree 
which is satisfied by A. It is unique. Indeed, 


J={f € Flax]: f(A) = OF 


is an ideal of F'[z], and the minimal polynomial of A is the unique monic generator 
of this ideal. Hence any polynomial f such that f(A) = O is divisible by the 
minimal polynomial. 
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Definition The characteristic polynomial of the matrix A is the polynomial 
det(aI — A). 


Theorem 5.21 (The Cayley—Hamilton Theorem) Let c(x) and m(x) be 
the characteristic and minimal polynomials of the matrix A. Then: 


(a) c(A) = O; 
(b) m(ax) divides c(x). 


Remarks The two parts of the theorem are clearly equivalent. The theorem 
can be proved by a direct calculation, which does not require all the background 
of the rational canonical form. But the proof given here may provide more insight. 


Proof The strategy is in three parts. First, we show that it suffices to deal 
with matrices in rational canonical form. Next, we show that it suffices to deal 
with companion matrices of polynomials. Finally, we prove the theorem directly 
for these matrices. The proof is an illustration of the way in which a canonical 
form theorem can be used to simplify calculations. 


Step 1 We show that, if P is invertible, then A and PAP™! have the same 
characteristic polynomials and the same minimal polynomials. Since, for every A, 
there is an invertible P such that PAP! is in rational canonical form, it suffices 
to prove the theorem for these. 

For the characteristic polynomial, we have 


det(aI — PAP~') = det(P(xI — A)P~') 
= det(P) det(aI — A) det(P)~' 
= det(aI — A), 


using the multiplicative property of determinants. 
For the minimal polynomial, we observe that, if f is any polynomial, then 


f(PAP*) = Pf(A)P™, 
so f(A) = O if and only if f(PAP~!) =O. 


Step 2 Let A be in rational canonical form. Thus, A has diagonal blocks 
C(fi), C(fa),.-.,C(fr), and O elsewhere, where C(f) is the companion matrix 
of f, and fi,...,f, are the invariant factors of A (so that f; divides f,., for 
1<i<r-1). We claim that a companion matrix C(f) has characteristic and 
minimal polynomial both equal to f. (The proof of this is given in Step 3.) Now 
the determinant of a block diagonal matrix is the product of the determinants 
of the diagonal blocks. Hence the characteristic polynomial of A is the product 
of the characteristic polynomials of C(f,),...,C(f,); that is, it is fy--- f,. 
Also, f; is the minimal polynomial of C(f;). Since f; divides f,. for all i, we 
see that f,(C(fi)) = O for all i, and hence that f,(A) = O. So the minimal 
polynomial divides f,, and hence divides the characteristic polynomial. 
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In fact, f, is the minimal polynomial, since a polynomial of smaller degree 
would not be satisfied by the block C(f;). 


Step 3. Let A=C/(f), where 


f(z) = 2" + ane" | +--+ +412 + ao. 


Thus, 
x —-1 0 rare 0 
0 x -—-l.... 0 
rl-A= ere tna ame Mee 
0. - née. 0 x —1 
ao ay... An—-2 L+An-1 


We prove that det(aJ — A) = f(x) by induction on n. 

For n = 1, we have f(x) = x — ag and A = (ag), so the assertion is true. 

Suppose that it holds for n — 1. Consider the formula for det(zJ — A) as a 
sum over permutations. Since there are only two non-zero elements in the first 
row, the only permutations which contribute to the sum are those with lg = 1 
or lg = 2. 

If lg = 1, the (1,1g) entry of aI — A is equal to x. Apart from this factor, 
the terms are just those in the determinant of the matrix with the first row and 
column deleted. This matrix is J — B, where B is the companion matrix of 
the polynomial 2”~! + a,_,a"~? + --- + a1. By the inductive hypothesis, the 
contribution is x(a"~! +--+ +a). 

If lg = 2, then 2g ¥ 2. The only other non-zero element in the second row 
is in the third column, so we can assume that 2g = 3. Similarly 3g = 4, ..., 
(n—1)g =n, ng = 1. Thus, g is a cyclic permutation, and its sign is (—1)”71. 
The term that we obtain is (—1)"~1ao, since the (i,i+ 1) entry of ef — Ais —1 
for 1 <i<n-—1, while the (n,1) entry is ag. So there is a single term ao. 

Thus 


det(aI — A) = a(2""* +---+41) +49 = f(z), 
as required. 

For the minimal polynomial, let V be a vector space with basis v1,...,Un, 
and let @ be a linear transformation of V with matrix A relative to this basis. 
Thus v;6 = v;41 fori+1,...,2—1, while 

UnO = —agvy — +++ — Gn—1Un- 


Thus, v; = v, 6+ for i=1,...,n, while v, f(0) = 0. Then 


vif (0) = 110°' f(0) = v1 f(A)" = 0, 
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so f(@) is represented by the zero matrix. Thus, f(A) = 0. Clearly no polyno- 
mial equation of smaller degree can be satisfied by A; so f(x) is the minimal 
polynomial. 

The proof is complete. 


Note that the minimal polynomial of a matrix is equal to the last elementary 
divisor, while the characteristic polynomial is the product of all the elementary 
divisors. 

This observation adds a little extra to the Cayley-Hamilton Theorem: 


Proposition 5.22 An irreducible polynomial divides the characteristic polyno- 
mial if and only if it divides the minimal polynomial. 


Proof Since the minimal polynomial divides the characteristic polynomial, the 
reverse implication is clear. Conversely, any irreducible which divides the char- 
acteristic polynomial must divide one of the invariant factors, and hence must 
divide the last invariant factor, which is the minimal polynomial. 


We conclude with a very brief discussion of one of the most important topics 
in linear algebra, namely, eigenvalues and eigenvectors. Again, this approach is 
not the most direct, but does show the usefulness of the rational canonical form. 


Definition An eigenvector of the n x n matrix A over F’ is a non-zero vector 
v € F” such that vA = Xv for some scalar A. The corresponding eigenvalue of 
A is X. 


Theorem 5.23 Let A€ M,(F). The following conditions for the scalar X are 
equivalent: 


(a) X is an eigenvalue of A; 
(b) » is a root of the characteristic polynomial of A; 
(c) » is a root of the minimal polynomial of A. 


Proof (a) implies (c): Let f(a) be the minimal polynomial of A. If vA = Av 
with v 4 0, then 0 = vf(A) = f(A)u; so f(A) = 0. 

(c) implies (b) by the Cayley-Hamilton Theorem 5.21(b). 

(b) implies (a): If det(AJ — A) = 0, then AJ — A is not invertible. A non-zero 
vector uv € Ker(AI — A) is an eigenvector of A with eigenvalue X. 


The concepts of eigenvalue and eigenvector can be applied to linear transfor- 
mations also. We omit the details. 


5.10 An application: league tables. In many league competitions, teams 
are awarded a fixed number of points for a win or a draw. It may happen that 
two teams win the same number of matches and so are equal on points, but 
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the opponents beaten by one team are clearly ‘better’ than those beaten by the 
other. How can we take this into account? 

You might think of giving each team a ‘score’ to indicate how strong it is, 
and then adding the scores of all the teams beaten by team T to see how well 
T has performed. Of course this is self-referential, since the score of T’ depends 
on the scores of the teams that T beats. So suppose we ask simply that the 
score of JT’ should be proportional to the sum of the scores of all the teams 
beaten by T. 

Now we can translate the problem into linear algebra. Let T),...,7;, be the 
teams in the league. Let A be the n x n matrix whose (i, 7) entry is equal to 
1 if T; beats T;, and 0 otherwise. Now for any vector (x1, %2,...,%n) of scores, 
the jth entry of «A is equal to the sum of the scores x; for all teams T; beaten 
by 7}. So our requirement is simply that 

x should be an eigenvector of A with all entries positive. 
Here is an example. There are six teams A, B, C, D, E, and F. Suppose that 
A beats B, C, D, E; 
B beats C, D, E, F; 
C beats D, E, F; 
D beats E, F; 


E beats F; 
F beats A. 


The matrix A is 


a nl ee ee ee) 
eFPrRrerF OO 
Pere oOoOo 
BROCCO 
= a) 
oo ooo 


We see that A and B each have four wins, but that A has generally beaten the 
stronger teams; there was one upset when F beat A. Also, E and F have the 
fewest wins, but F took A’s scalp and should clearly be better. 

Calculation with MAPLE shows that the vector 


(0.7744, 0.6452, 0.4307, 0.2875, 0.1920, 0.3856) 


is an eigenvector of A with eigenvalue 2.0085. This confirms our view that A is 
top of the league and that F is ahead of E; it even puts F ahead of D. 

But perhaps there is a different eigenvalue and/or eigenvector which would 
give us a different result? 

In fact, there is a general theorem called the Perron—Frobenius theorem 
which gives us conditions for this method to give a unique answer. Before we 
state it, we need a definition. 
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Let A be an n x n real matrix with all its entries non-negative. We say that 
A is indecomposable if, for any i,7 with 1 < 1,7 <n, there is a number m 
such that the (2,7) entry of A™ is strictly positive. 

This odd-looking condition means, in our football league situation, that for 
any two teams J; and Tj, there is a chain T),,...,7,,, with T,, = Tj; and 
Tk,, = Ij, such that each team in the chain beats the next one. Now it can be 
shown that the only way that this can fail is if there is a collection C' of teams 
such that each team in C’ beats each team not in C’. In this case, obviously 
the teams in C' occupy the top places in the league, and we have reduced the 
problem to ordering these teams. So we can assume that the matrix of results is 
indecomposable. 

In our example, we see that B beats F beats A, so the (2,1) entry in A? is 
non-zero. Similarly for all other pairs. So A is indecomposable in this case. 


Theorem 5.24 (Perron—Frobenius Theorem) Let A be anxn real matrix 
with all its entries non-negative, and suppose that A is indecomposable. Then, 
up to scalar multiplication, there is a unique eigenvector v = (v1 aoe Zn)" 
for A with the property that x; >0 for alli. The corresponding eigenvalue is the 
largest eigenvalue of A. 


So the Perron—Frobenius eigenvector solves the problem of ordering the teams 
in the league. 


Remarks 1. Further refinements are clearly possible. For example, instead of 
just putting the (2,7) entry equal to 1 if T; beats T;, we could take it to be the 
number of goals by which T; won the game. 

2. This procedure has wider application. How does an Internet search engine 
find the most important web pages that match a given query? An impor- 
tant web page is one to which a lot of other web pages link; this can be 
described by a matrix, and we can use the Perron—Frobenius eigenvector to do the 
ranking. 


Exercise 5.4 (a) Prove that p(n) < 2” for all n. [Hint: The number of partitions of n 
containing at least one part i for i <n is p(n — 1); so 


p(n) < dn — 4), 


where we have < since some partitions are counted more than once. Now use induction. ] 
(b) Hence show that fa(p™) < 2” for any prime p. 
(c) Hence show that fa(n) < n for any n. 


Exercise 5.5 Prove the Cayley-Hamilton Theorem for 2 x 2 and 3 x 3 matrices by 
direct calculation. [This was done by Cayley, and the 4 x 4 case by Hamilton; it was 
Frobenius who produced the first general proof.] 
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Exercise 5.6 Prove that two 3 x 3 complex matrices are similar if and only if they 
have the same characteristic and minimal polynomials. 
Is the same true for 4 x 4 matrices? 


Exercise 5.7 Show that the invariant factors of a matrix A over a field F' are the 
non-constant diagonal elements in the Smith normal form of the matrix «I — A over 


Exercise 5.8 The name rational canonical form comes from the fact that if a 
matrix A has rational entries, then so does its rational canonical form, even if we work 
over a larger field. More generally, if A is an n x n matrix over F’, and K is a field 
containing F’, then the rational canonical forms of A over F' and over K are identical. 
Prove this. 

[This is not true for the primary rational canonical form, since enlarging the field 
may cause irreducible polynomials to become reducible. For example, the real matrix 


Cia? +1) = (°, 0) is in (primary) rational canonical form over R, but over C, its 
_ fi O 
Jordan form is |}. 
0 —-i 
Exercise 5.9 (a) Prove that, for any eigenvalue A of the matrix A, the set of 
eigenvectors with eigenvalue A, together with the zero vector, is a subspace of F”. 


(b) Prove that, if v1,...,v% are eigenvectors associated with distinct eigenvalues 
A1,.--,Ak respectively, then v1,...,v% are linearly independent. 


Exercise 5.10 Find the eigenvalues and eigenvectors of the real matrix 


11 -1 -4 
-1 11 -4 
-4 -4 14 


Exercise 5.11 Give a proof of Theorem 5.23 in the order (a) implies (b) implies (c) 
implies (a). 


Exercise 5.12 In Section 3.2, we made the easy observation that any abelian group 
is the additive group of a ring. Prove that any finitely generated abelian group is the 
additive group of a ring with identity. 


Exercise 5.13 An abelian group G is generated by elements 21,...,%n satisfying the 
relations 


ai1%1 ++++ + Gintn = 0 


for i = 1,...,n, where aj; € Z. (We assume that all relations satisfied by x1,...,%n 
are consequences of these.) Let A be the matrix (a;;). Prove that 


(a) if det(A) = 0, then G is infinite; 
(b) if det(A) 4 0, then |G| = | det(A)|. 
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Exercise 5.14 An abelian group G is generated by elements x,y,z satisfying the 
relations 


62 + 10y = 0, 
62+ 15z = 0, 
10y + 15z=0 


(with the same convention as in the preceding exercise). Write G as a direct sum of 
cyclic groups. 


6 The number systems 


The nineteenth-century German mathematician Leopold Kronecker said, ‘God 
made the integers; the rest is the work of man’. Many others, including 
non-mathematicians, have felt similarly 


What could be more general than 2, which can represent two galax- 
ies or two pickles, or one galaxy plus one pickle (the mind doth 
boggle), or just 2 gently bobbing—where? It, like God, is an “I 
am” and many have thought that it must be a precipitate of ulti- 
mate reality. 

Alfred W. Crosby (1997). 


We may take it that by the integers, Kronecker meant the counting num- 
bers (the positive integers). Any civilisation that has left records knew how to 
count. The other number systems (zero and negative integers, rational numbers, 
real numbers, and complex numbers) are very much more recent. Historians of 
mathematics can trace for us the origin and development of these systems. 

If we start with the positive integers, we have a system in which addition 
and multiplication can be performed, but subtraction and division cannot. More 
precisely, subtraction and division are not operations on natural numbers; they 
do not satisfy the closure law. In Section 2.14, we saw how to embed an integral 
domain into a field (its field of fractions); in other words, such a ring can be 
embedded in a larger ring in which division is possible. The prototype for this 
process is the enlargement of the integers to the rational numbers. We will see 
that a very similar process can be used to build the integers from the natural 
numbers. 

It is more difficult to construct the real numbers from the rationals. We want 
to enlarge the rationals to include not only the roots of polynomials (such as 
V2), but also other useful numbers such as 7 and e. In effect, we have to plug 
the gaps in the rationals, and this is not entirely an algebraic process; some ideas 
from analysis, such as Cauchy sequences, are needed in this procedure. Once we 
have the real numbers, we obtain the complex numbers by a process that we 
have already seen in Section 2.16: adjoining a root of the polynomial equation 
x2+1=0. 

However, science marches on, and various tasks that were once God’s preserve 
are now carried out by white-coated technicians. So it is that mathematical 
logicians have created the natural numbers, and indeed have created them out 
of nothing (more precisely, starting from the empty set). 
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In the first section of this chapter, we will look at these constructions in 
some detail. In the second section, we examine the distinction between algebraic 
numbers (those which satisfy some polynomial equation with integer coefficients) 
and transcendental numbers (those which do not): we prove in three different 
ways that transcendental numbers exist, and show that the classical problems of 
squaring the circle, duplicating the cube and trisecting the angle are insoluble 
with ruler and compass. In the final section, we treat in a little more detail some 
aspects of set theory: cardinality and the Axiom of Choice. 


To the complex numbers 


6.1 The natural numbers. The natural numbers were invented for the 
purpose of counting. It is easy to believe that the earliest pastoral societies 
relied on counting for keeping tallies of their flocks. There is some evidence that 
this was done by establishing a bijection between the animals in a flock and a 
collection of pebbles or marks on a stick. A more sophisticated method is to have 
names for the natural numbers, independent of any physical representation of 
them. From that, it is a short step to algorithmic manipulation of numbers, so 
that subtraction can be used to establish how many animals are missing when 
the herd returns. 

It does not really matter what names are used for the numbers, any more 
than it matters what material is used to construct the standard metre. However, 
it is very convenient if, for example, the standard number 248 is a set with 248 
elements. If we adopt this principle, then zero should be a set with no elements. 
As we saw, there is a unique empty set, and this we take as zero. Then the 
number 1 should be a set with one element; using what we have to hand, we take 
it to be {0}. Then we take 2 = {0,1}, 3 = {0,1,2}, and so on. We see that each 
natural number is the set consisting of all smaller natural numbers (including 
zero). This leads us to the formal construction. 


Definition 


e The empty set is a natural number (called zero, and written 0). 
e If nis a natural number, then so is n U {n}. 
e Every natural number is generated by these two rules. 


The natural number nU {n} is called the successor of n. We temporarily 
write it as s(n); later we will call it n+ 1. 

The most important property of the natural numbers is the principle of 
induction. 


Theorem 6.1 Let P(n) be a proposition about the natural number n. Suppose 
that 


(a) P(0) is true; and 
(b) P(n) implies P(s(n)) for any natural number n. 


Then P(n) holds for all natural numbers n. 
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Proof The definition of the natural numbers makes clear that the set of natural 
numbers satisfying P is the same as the set of all natural numbers. 


The Principle of Induction can be used for definitions as well as proofs. In 
order to define a function f on the set of natural numbers, it is enough to define 
f(0) and to define f(s(n)) in terms of f(n). For example, we define addition by 


m+0=m, 
m+ s(n) = s(m+n), 
for all natural numbers m and n, and then define multiplication by 
m:-0=0, 
m:-s(n)=m-n+m, 


for all m and n. From these definitions, it is possible to prove all the ‘elementary’ 
properties of addition and multiplication. The proofs are quite complicated to 
follow, since they use the idea of ‘double induction’. We are trying to prove a 
proposition such as the commutative law m+n =n-+m which depends on two 
variables. We prove it by induction on n, so we have to show that m+0 =0+m™ 
and also that (m+n =n+m) > (m+ s(n) = s(n) +m). Each of these sub- 
propositions is proved by induction on m (since at the beginning, induction is 
the only tool we have!) We can formalise ‘double induction’ as follows: 


Theorem 6.2 Let P(m,n) be a proposition about pairs of natural numbers. 
Assume that: 


(a) P(0,0) is true. 

(b) P(0,n) implies P(0,s(n)) for all n. 

(c) P(m,0) implies P(s(m),0) for all m. 

(d) For a given value of m, from the truth of P(m,x) for all x, and also that 
of P(s(m),n) for some n, we can infer the truth of P(s(m), s(n)). 


Then P(m,n) is true for all m and n. 


Proof Hypotheses (a) and (b) are the base case and inductive step for the 
proof of P(0,7) for all n. Similarly, hypotheses (c) and (d) are the base case and 
inductive hypothesis for a proof (by induction on n) of the statement that, if 
P(m,n) holds for all n, then P(s(m),n) holds for all n. Together these show by 
induction on m that P(m,n) holds for all m. 


Using this principle, we show the commutative law for addition. 
Proposition 6.3 m+n=n+™ for all natural numbers m,n. 
Proof For double induction, we have to show: 


(a) 0+0=0+0: 
(b) 0+ n =n +0 implies 0 + s(n) = s(n) + 0; 
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(c) m+0=0+™m implies s(m) +0 = 0+ s(m); 
(d) m+a2=a+4+M for all z, and s(m)+n=n-4 s(m), imply s(m) + s(n) = 
s(n) + s(m). 


Here (a) is a triviality. To prove (b), assume that 0+-n = n+0; that is, 0+n = 
n, since n+0 = n by definition of addition. Now, by definition, 0+s(n) = s(0+n), 
which is equal to s(n) by the assumption, and is equal to s(n)+0 by the definition 
of addition again. Now (c) is just (b) in disguise, so it is also true. 

To prove (d), we assume that m+ a = «+m for all x (and the fixed value 
of m), and that s(m) +n = n+ s(m) (where n has some fixed value also). 
We have 


s(m) + 8(n) = s(s(m) +n) = s(n + 8(m)) = s(s(n + m)), 


where the first and third equalities follow from the definition of addition, and 
the second from our assumptions. Similarly, 


s(n) + s(m) = s(s(n) + m) = s(m+ s(n)) = s(s(m+n)). 


By assumption, m+n =n-+™m; it follows that s(m) + s(n) = s(n) + s(m), as 
required. 


It follows from the definition of addition that n+1= n+ s(0) = s(n). So we 
can replace the notation s(n) by the more familiar n + 1. 

Various further properties can be proved in a similar way. I will not give the 
proofs. Here is a list of what is needed; if you have great powers of perseverance, 
and work through them all, you will have put the natural numbers on a firm 
logical basis. 


Closure laws: For all a,b € N, we have a+ b,ab EN. 

Associative laws: For all a,b,c € N, we have a+(b+c) = (a+ 6) +c and 
a(bc) = (ab)c. 

Commutative laws: For all a,b € N, we have a+b=6+a and ab = ba. 
Distributive law: For all a,b,c € N, we have a(b+c) = ab+ac. 

Zero law: For all a € N, we have a+0 =a. 

Identity law: For all a € N, we have al = a. 

Cancellation laws: If a+b=a-+c, then b= c; if a #0 and ab = ac, then 
b=c. 


We can also define the usual ordering on the natural numbers: a < 6b holds if 
and only if there is a natural number c such that a+ c= b. 


6.2 The integers. The motivation for extending the natural numbers to 
include negative numbers is to allow subtraction; more precisely, to produce 
a number system which is closed under subtraction. Accordingly, we want to 
construct numbers called a — b for all a,b € N, where a — b is a solution x to 
the equation x + b = a. There are a couple of problems. First, if a > b, then 
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N already contains an element a — b. Second, the same integer will have many 
different names, since (for example) 2-3 = 5 — 6. 

Accordingly, we represent the integer a — b by the ordered pair (a, b), and we 
want to ensure that, if a— b = c—d (in other words, if a+ d= b+), then 
the pairs (a,b) and (c,d) should represent the same integer. This is a job for an 
equivalence relation! 

So here is the formal definition: 

Define a relation ~ on the set of ordered pairs of natural numbers by the 
rule that (a,b) ~ (c,d) if and only ifa+d= b+ c. Then ~ is an equivalence 
relation: 


e Since a+ b=b+a, we have (a,b) ~ (a,b); so ~ is reflexive. 

eIfat+d=b+c, then c+b=d+a; so (a,b) ~ (c,d) implies (c,d) ~ (a,b), 
and ~ is symmetric. 

e Suppose that (a,b) ~ (c,d) and (c,d) ~ (e, f). Then a+d = 6+ c and 
c+ f=d+e. Thus 


a+c+f=at+td+e=b+cte, 


and the cancellation law implies that a+ f = b+ e; that is, (a,b) ~ (e, f), 
and ~ is transitive. 


Definition An integer is an equivalence class of the relation ~. We denote 
the equivalence class containing (a,b) temporarily by [a,b]. We define addition 
and multiplication of equivalence classes by the rules 


[a,b] + [c,d] =[at+c,b+d), 
[a, b] - [c, d] = [ac + bd, ad + be]. 
Let Z denote the set of integers with these operations. 


Where do these definitions come from? The symbol [a, }] is going to be the 
integer a — b; and we have 


a—b=c-—dSa+d=b+¢c, 
(a—b)+(c—d) =(a+c)—(b+d), 
(a — b) - (c— d) = (ac + bd) — (ad + bc). 


Theorem 6.4 The set Z, with the above-defined operations, is a commutative 
ring with identity, and is an integral domain. 


Proof Before we begin the verification of the axioms, we must first prove that 
the operations are well defined. That is, we must prove that, if (a,b) ~ (a’,b’) 
and (c,d) ~ (c’,d’), then (a+c,b+d) ~ (a’+¢,b' +d’) and (ac + bd, ad + bc) ~ 
(a’c' + b'd',a'd' +. Uc’). In yet other words, we must show that, ifa+b’ =b+a’ 
and c+ d' =d+c’, then (a+c)+(b' +d’) = (b+ d) + (a +c) and (ac+ bd) + 
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(a'd’ + U'c’) = (ad + bc) + (a'c' + Ud’). These facts follow from the properties of 
N by tedious but elementary algebraic manipulations. 

Now we verify the eight ring axioms, the commutative law, and the existence 
of an identity. Relatively straightforward calculations are needed. Take the left 
distributive law as an example. For any natural numbers a, b,c, d,e, f, 


a, bl[c+e,d+ f] 
a(c+e)+b(d+ f),a(d+ f) + b(c+e)], 
ac + bd, ad + bc| + [ae + bf, af + be] 

ac + bd + ae+bf,ad+bc+ af + be] 


[a, \(le, d] + [e, fl) = 


[a, b| [c, d| TF [a, b| le, f] = 


and the equality of the right-hand sides follows from properties of N. 

The zero element is the class [a,a]; the negative of [a,b] is [b,a]; and the 
identity is the class [1, 0]. 

To see that Z is an integral domain, it is convenient to choose representatives 
for the equivalence classes as follows: If a > b, then [a,b] = [a — b,0]; if a < }, 
then [a, b] = [0,b—a]. So we may assume that either a or b is zero. Now take two 
non-zero integers, each represented as either [a,0] or [0,6] with a or b non-zero. 
For their product there are several cases. For example, 


[a, 0] - [0, 6] = [0, ad] 4 0, 


since if a,b £ 0 then ab £ 0 by the Cancellation Law for N. 


The set N is embedded isomorphically into Z by the map taking a to [a, 0]. 
Since [a, b] + [b, 0] = [a+ b, b] = [a, 0], we see that x = [a, b] is the solution to the 
equation x + b = a, where we identify a and b with the corresponding integers 
[a,0] and [6,0] respectively. So we can denote [a,b] by the more usual notation 
a — b. Moreover, 


[a,b] — [c,d] = [a+ d,b+c], 


so subtraction is everywhere defined on Z. 
The ordering on Z can be defined by the rule that a < b if and only if b—a 
is a natural number. Equivalently, [a,b] < [c,d] if and only if the inequality 


a+d<b+c holds in the natural numbers. 


6.3. The rational numbers. The construction of the rational numbers from 
the integers is quite similar to the construction of the integers from the natural 
numbers. In Z, we can add, subtract, and multiply, but not always divide; we 
want to add elements a/b, that is, solutions of bx = a, whenever b 4 0. Similar 
comments about the non-uniqueness of the representation a/b apply. If you stud- 
ied Section 2.14, you will have seen the process: Q is the field of fractions of Z. 
I will run through the construction more briefly. 
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Let X be the set of all pairs (a,b) of integers with b # 0. Define a relation ~ 
on X by the rule that (a,b) ~ (c,d) if and only if ad = bc. This relation is: 


e reflexive, since ab = ba; 

e symmetric, since ad = bc implies cb = da; 

e transitive, since ad = bc and cf = de imply adf = bcf = bde, and hence that 
af = be (since d 4 0 and Z is an integral domain). 


Hence ~ is an equivalence relation. We let [a,b] denote the equivalence class 
containing (a,b), and let Q be the set of equivalence classes (its elements are 
called rational numbers). 

Now we define addition and multiplication of rational numbers by the rules 


(a, b] + [c, d] = [ad + be, bd}, 
(a, b] - [c, d] = [ac, bd]. 


(These are motivated by the rules for adding and multiplying fractions, since 
[a, b] is to represent the rational number a/b.) It can now be shown that these 
operations are well defined (independent of the choice of representatives), and 
also that the following holds: 


Theorem 6.5 Q, with the above operations, is a field. 


Moreover, Z is embedded isomorphically into Q by the map taking a to [a, 1]. 
If (as usual) we denote this element by the same symbol a, then if b 4 0 we have 
b~! = [1,], and hence [a,b] = ab~' = a/b, as we intended. 

Of course, any equation bx = a, with a,b € Q and b ¥ 0, can now be solved. 
Ifa= (a1, ag] and b = [b1, ba] with by x 0, then a/b = (abo, ab,]. 

The ordering of the rational numbers can be defined. The rational number 
(a, b] is positive if a and b have the same sign (both positive or both negative); 
then q < r if and only if r — q is zero or positive. 


6.4 The real numbers. At each stage after the natural numbers so far, 
we have been enlarging the number system so as to make an operation (either 
subtraction or division) defined everywhere it should be; in other words, to ensure 
that equations (x + b = a or bx = a respectively) have solutions. 

The next stage is different. There are still many equations which do not have 
solutions (for example, the equation x? = 2 which gave the Pythagoreans so 
much trouble). There are also various non-algebraic equations that we would 
like to solve, such as sinx = 1 or log x = 1. These fail to have solutions because, 
although we can approximate the solutions as closely as we like, we cannot 
express them exactly by rational numbers. Although the rationals are dense 
(in the sense that between any two we can always find another), there are still 
many gaps between them that we have to fill. 

The technique follows the general outline that we have used before. First, 
we find a method to represent one of the ‘missing’ numbers. We calculate what 
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it means for two such representations to define the same number, and hence 
define an equivalence relation on them. Then we define a ‘real number’ to be an 
equivalence class of this relation, and give rules for addition and multiplication of 
real numbers. However, unlike the previous case, we can now use the machinery 
of rings and ideals to help. 

There are two methods commonly used for the construction of the real num- 
bers, namely Cauchy sequences and Dedekind cuts. The fields constructed 
by these two approaches turn out to be isomorphic. I will use Cauchy sequences, 
which permit a more algebraic approach. The motivation is the representation 
of a real number by an infinite decimal expansion. Such a decimal has the form 
nN.a1a243..., where n is an integer and aj, qa2,a3,... are decimal digits. This 
number is the limit of the sequence 


n,n.a, =n+a,/10,n.a,a2 = n+ a1/10 + a2/100,... 


Each term in the sequence is rational, and the sequence is a Cauchy sequence 
according to the following definition. We use the notation |q| for the modulus 
of the rational number gq; that is, 


Definition 


e A Cauchy sequence of rational numbers is defined to be a sequence 
do; %1592;--- with q, € Q for all n, satisfying the following condition: for any 
positive rational ¢, there exists a positive integer N such that |@m— dn| < € 
whenever m,n > N. 

e A null sequence of rational numbers is a sequence qo, qi, g2,--- With qn € Q 
for all n, satisfying the following condition: for any positive rational ¢€, there 
exists a positive integer N such that |g,| < € whenever n > N. 


We define addition and multiplication of sequences componentwise: that is, 
if (qn) and (r,,) denote the sequences with nth terms q, and r,, respectively, their 
sum has nth term gy, + Ty, and their product has nth term gnrn. 


Let C and N denote the sets of Cauchy sequences and null sequences 
respectively. 


Theorem 6.6 C is a commutative ring with identity, and N is a maximal ideal 
in C. 


Proof The proof that C is a commutative ring involves some fairly standard 
verification of axioms; a kind of mixture of algebra and analysis. Its zero and iden- 
tity are the constant sequences with values 0 and 1 respectively. Two examples 
will illustrate. 

Closure under multiplication: We use the fact that any Cauchy sequence is 
bounded. For let (g,) be a Cauchy sequence. Choosing « = 1 in the defini- 
tion, let N have the property that |qm— @n| < 1 for m,n > N. It follows 
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that |¢n| < |gn4i| + 1 for n > N. So, if B is the greatest of the numbers 
lqi|, lgal,---, lawl, ]anw4il +1, then we have |q,| < B for all n. 

Now let (q,) and (r,) be Cauchy sequences, bounded by B and C respectively. 
Given € > 0, choose N so that |¢m— qn| < €/2C for m,n > N, and choose M so 
that |rm —Tn| < €/2B for m,n > M. Let P be the greater of M and N. Then, 
for m,n > P, we have 


ldml'm _ Ann ad |dml'm Qm¥n + AmTn In? n| 
< ldm| ¢ [Tm = Trl + Irn . lm = In| 
< B(e/2B) + C(e/2C) 


= €. 


So (dnTn) is a Cauchy sequence. 
Commutative law for multiplication: The nth terms of (dn)(rn) and (7n)(dn) 
are, respectively, gnry, and rygn, which are equal, by the commutative law for 
multiplication in Q. 

It is not obvious that NV is a subset of C. Let (q;) be a null sequence, and let 
€ > 0 be given. Choose N so that |¢n| < €/2 for n > N (this is done by applying 
the definition of a null sequence with €/2 replacing ¢). Then, if m and n are both 
greater than N, we have |qm| < €/2 and |qn| < €/2; so |¢m — dn| < €. Thus (qn) 
is a Cauchy sequence. 

Now further standard verification shows that V is an ideal of C. (We again 
require the fact that Cauchy sequences are bounded). 

To show that NV is a maximal ideal, let J be any ideal of C which properly 
contains it; we must show that J = C. Take any sequence (q,) which lies in J 
but not in NV. Then (q,,) is not a null sequence. Negating the definition of a 
null sequence yields the following: there exists some € > 0 such that, for any N, 
there are terms dn, of the sequence with n > N such that |qn| > €. But (qn) is a 
Cauchy sequence; so, taking €/2 in the definition, we find a number M such that 
lm — Qn| < €/2 for all m,n > M. We know that there exists some m > M with 
ldm| > 6 it follows that |q,| > €/2 for all n > M. In other words, apart from 
finitely many terms at the start, the sequence (q,,) is bounded away from zero. 

Now let (a,) be any Cauchy sequence. Define a sequence (r,,) by 

” {" if dn = 0, 
" Van/dn if dn 4 0. 
The case gq, = 0 can only occur finitely often. Using this and the fact that qn 
is bounded away from zero for n > M, it can be shown that (r,,) is a Cauchy 
sequence. Hence (rn)(dn) € J. 

But rndn = Ln whenever g, # 0; that is, for all but finitely many values. So 
the sequence (&p — TnGn) is zero from some point onwards, and hence certainly 
a null sequence, and thus in J. Then (a,) = (@n — Tndn) + (Tn)(dn) € J als 
Since (x,) was an arbitrary Cauchy sequence, we have J = C, as required. 
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Definition A real number is an element of the ring R = C/N. 
Theorem 6.7 R is a field. 


This follows from the preceding theorem and Theorem 2.27. 

There is an isomorphic embedding of Q into R: we map any rational number 
q to the coset of NV containing the constant sequence (q). (This coset consists of 
all sequences of rational numbers which have the limit g.) Furthermore, we can 
now define Cauchy sequences of real numbers, and it is possible to show that 
every Cauchy sequence of real numbers converges to a real number; that is, the 
field of real numbers is complete. (To define Cauchy sequences of real numbers, 
we need first to define the modulus of a real number, which itself depends on the 
ordering, defined below.) 

In accordance with our original motivation, we note that the decimal expan- 
sion of a real number does indeed give a Cauchy sequence representing that 
number. Also, it can be shown that every real number has a decimal expansion 
(that is, the coset of NV representing that number contains a particular Cauchy 
sequence of the type arising from a decimal expansion). 

The ordering of the real numbers can be defined as follows: We say that a 
Cauchy sequence (g,) is positive if it is not a null sequence and q, > 0 for 
all but finitely many values of n. Then we say that (dn) < (Tn) if (Tn — dn) is 
positive. Now it can be shown that, if (qn) < (Tn), then (¢/,) < (r/,) for any 
Cauchy sequences (qj,) and (r/,) which differ from (q,) and (rp) respectively by 
null sequences. This means that we have a well defined ordering on the cosets of 
N in C, that is, on the real numbers. 


6.5 The complex numbers. We have already met the construction of the 
complex numbers from the reals. The aim is to enlarge the real numbers to a field 
containing a square root of —1 (a root of the polynomial equation x? + 1 = 0). 
We will see that we get much else too. 

As an instance of the construction of a field extension in which a given irre- 
ducible polynomial has a root (described in Section 2.16), we define the field C 
of complex numbers as the factor ring R{z]/(2? +1). (The polynomial x? + 1 is 
irreducible in R[z], because a? + 1 > 1 > 0 for all a € R.) If i denotes the root 
of the polynomial x? + 1 (that is, the coset (x? +1) + 2x), then every element 
of C can be written in the form a-+ bi, where a and 0D are real, and the expres- 
sion for a given complex number in this form is unique. Now the addition and 
multiplication are given by the usual ‘rules of arithmetic’, putting i? = —1: 


(a+ bi) + (c+ di) = (a+c)+(b+d)i, 
(a+ bi) - (c+ di) = (ac — bd) + (ad + be)i. 


One of the most important properties of C is the ‘Fundamental Theorem of 
Algebra’. This shows that it is not necessary to construct still larger fields to 
include roots of more complicated polynomials. 
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Theorem 6.8 (Fundamental Theorem of Algebra) Any non-constant 
polynomial in C[x| has a root in C. 


Despite its name, the ‘Fundamental Theorem of Algebra’ is not a theorem of 
algebra at all. All known proofs of it (and there are many) require some argu- 
ments from analysis. The best-known proof uses Liouville’s Theorem, a result 
which comes at the end of a first course on complex analysis. Liouville’s Theorem 
states that a complex analytic (that is, everywhere differentiable) function which 
is bounded must be constant. If f were a non-constant polynomial with no roots 
in C, then it can be shown that 1/f(z) — 0 as z — ov, and hence that 1/f is 
bounded; Liouville’s Theorem would then imply that 1/f (and hence also f) is 
constant, a contradiction. 

In Chapter 8, there is a completely different proof. It replaces most of the 
analysis by algebra, using only facts about R which are consequences of the 
Intermediate Value Theorem. By contrast, it uses some fairly sophisticated group 
theory. 


Definition We say that a field F is algebraically closed if it has the prop- 
erty that any non-constant polynomial in Fz] has a root in F. Thus we can 
express the conclusion of the Fundamental Theorem of Algebra more simply: 
C is algebraically closed. 


Exercise 6.1 Prove that 2+2 = 4. 
Remark Bertrand Russell, in his History of Western Philosophy, says: 


‘3’ means ‘2+1’, and ‘4’ means ‘34 1’. Hence it follows (though the 
proof is long) that ‘4’ means the same as ‘2+2’. Thus mathematical 
knowledge ceases to be mysterious. 


Exercise 6.2 Prove that, if a and 6 are natural numbers (regarded as sets, as in the 
construction), then the following are equivalent: 


(a)a<b (b)aCb (c)aced. 
Exercise 6.3 In the construction of natural numbers, we take 


0=9, l= {0}, 2= {0, {O}}, 3 = {0, {O}, (0, {O}}}, 


So each natural number is represented by a string of symbols from the alphabet with 
four symbols: 0, opening and closing braces, and comma. Calculate the number of 
occurrences of each symbol in the string representing n. 


Exercise 6.4 Prove carefully that Z is a ring. 


220 The number systems 


Exercise 6.5 («) Prove the division algorithm for Z: if a,b € Z with b > 0, then there 
exist gr € Z witha =bq+rand0<r<b. 


Exercise 6.6 («) Prove the Principle of the Supremum for R: if S is a non-empty 
subset of R which has an upper bound, then S' has a least upper bound. 


Algebraic and transcendental numbers 


Among the real (or complex) numbers, there are some (such as V2 or i) which 
satisfy polynomial equations with integer coefficients, and others (like a and e) 
which do not. In this section, we examine the distinction between the two classes 
of numbers, and give an application to ruler-and-compass constructions. 


6.6 Algebraic numbers. We followed the traditional approach to the con- 
struction of the number systems. Another way to proceed, having constructed 
the rationals, would be to add next the roots of polynomials, and then put in all 
the other useful numbers that we require. 

Instead of doing that, we now look back and examine the algebraic numbers 
(the roots of polynomials over the rationals), and establish that they do form a 
field. 


Definition Let F be a field, EF a subfield of F, and a € F. We say that a is 
algebraic over E if there is a non-zero polynomial f € E[x] such that f(a) =0 
(evaluated in F’). 


We will prove that the set of all elements of F which are algebraic over E is 
a field. In order to do this, we require some results about field extensions. 


Definition Let F bea field, FE a subfield of F. 


(a) For a € F, the field generated by a over E is defined to be the smallest 
subfield of F which contains both E and a, denoted by E(a). 

(b) The degree of F over E, denoted by [F' : E], is the dimension of F' as 
a vector space over & (when we allow multiplication only of elements of F’ 
by elements of FE, as in Example 3 in Section 4.2.) 


Proposition 6.9 Let E be a subfield of F, anda € F. Then a is algebraic over 
E if and only if [E(a) : E] is finite. 


Proof Suppose first that [E(a) : E] = n is finite. Then the n + 1 elements 
1,a,a7,...,a” of the vector space E(a) over E must be linearly dependent. So 
there exist scalars cg, C1, C2,---,Cn € E, not all zero, such that 


co +cat ca? +--+ +c,a" = 0. 


This equation says that the non-zero polynomial f(x) = co + civ +--+ + cnx” 
has a as a root; so a is algebraic over E. 
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Conversely, suppose that a is algebraic over E. Let f € E[a] be a monic poly- 
nomial of least degree satisfied by a. (This is called the minimal polynomial 
of a over E.) Now f is irreducible in E[a]. For, if f = gh, where g and h have 
smaller degree than f, then we have g(a)h(a) = 0 in F; so either g(a) = 0 or 
h(a) = 0, contrary to the choice of f as the polynomial of smallest degree that 
has a as a root. 

Let g be any polynomial in E[a]. Obviously, if f divides g, then g(a) = 0. 
Conversely, suppose that g(a) = 0. Write g = fq+r, where r = 0 or r has degree 
less than deg(f). But r(a) = g(a) — f(a)q(a) = 0; so, by choice of f, we have 
r =0, and f divides g. 

Now let @ be the ‘evaluation’ homomorphism from E[z] to F' defined by 
g@ = g(a). We have Ker(@) = (f) (by the preceding paragraph: for this equation 
says that g(a) = 0 if and only if f divides g). Hence Im(@) = E[z]/(f), by 
the First Isomorphism Theorem. Since f is irreducible, E[z]/(f) is a field. Thus, 
Im(6) is a field containing E and a. Also, we know from Section 4.2 that E[z]/(f), 
and hence Im(6), is a finite-dimensional vector space over EF. So [E(a) : E] is 
finite, as required. 


Theorem 6.10 Suppose that E,F,G are fields with EC F CG. Then |G: E] 
is finite if and only if both |G: F] and |F': E] are finite. If this holds, then 


IG: FE) =[|G: F]-[F: E). 


Proof If [G: £] is finite, then so is [F : E] (since F' is a subspace of G, as 
E-vector spaces); and so also is [G : F'] (since a basis for G as E-vector space 
certainly spans G as F’-vector space). 

Conversely, suppose that [F : E] = m and [G: F] = n are finite. Let 


fi,.--,fm be a basis for F as E-vector space, and let gi,...,gn be a basis for 
G as F-vector space. We claim that the mn elements f,g;, for 1 =1,...,m and 
j =1,...,n, form a basis for G as E-vector space. Proof of this claim will show 


that [G : E] is finite, and also prove the product formula for the degree. 
Spanning: Take a € G. Express it in terms of the basis over F’; say 


a=byg, +---+bngn- 
Now each 0; is an element of F', so can be written in terms of the basis over F: 
bj = ci fi dee oe Crip dmes 
Substitution gives 
m n 
a=) Dd cisfisi- 
i=1 j=l 


So the elements f;g; form a spanning set. 
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Linearly independent: Suppose that 
m n 
cu fig; = 0. 
Agel 


Each term in the sum over j, namely 57)", ci; fi, is an element of F’. Since 
Ji,+++;9n are linearly dependent over F’, we must have 57)"", ci; fi = 0 for each j. 
Now the linear independence of f1,..., fm over E shows that all the coefficients 
Cij are Zero. 

This completes the proof. 


Theorem 6.11 Let E and F be fields with E C F. Then the set of all elements 
of F which are algebraic over E is a field containing E. 


Proof In Chapter 2, we did not specifically develop a subfield test. But we will 
be done if we prove that the set A of elements algebraic over E is a subring con- 
taining the inverses of all its non-zero elements. For it is certainly commutative 
(since it is contained in the field F’) and has an identity (since it contains the 
field F). So we have to show that, for any a,b € A, we have a— DE A, abe A, 
anda-'e€ Aifa¥0. 

So choose any a,b € A. Then a is algebraic over E, so [E(a) : E] is finite. 
And 6 is algebraic over E, hence certainly algebraic over E(a); so [E(a, b) : E(a)| 
is finite (where we have used E(a,b) as an abbreviation for E(a)(b i: the field 
generated by a and b over E). By the above proposition, [E (a,b) : E] is finite. 
But E(a,b) contains a — b, ab, and a! (if a 4 0); so all these elements are 
algebraic over EF, and lie in A, as required. 


Now we let A be the set of all complex numbers which are algebraic over Q. 
Then A is a field. Its elements are called algebraic numbers. Using the 
Fundamental Theorem of Algebra, we can show: 


Theorem 6.12 A is an algebraically closed field. 


The proof depends on the following result. If E is a subfield of F’, we say that 
F is algebraic over E if every element of F is algebraic over EF. 


Proposition 6.13 Let E,F,G be fields with E C F CG. If F is algebraic over 
E, and G is algebraic over F, then G is algebraic over E. 


Proof Take any element c € G. Since c is algebraic over F’, there is a polyno- 
mial f(x) = 2” + an_12"" 1 +--+++ a9 in F[a] such that f(c) = 0. Now each of 


a9, @1,---,@n—1 is algebraic over E. So each of [E(ao) : E], [E(ao, a1) : E(ao)], 
and so on, is finite. So [E(ao,@1,...,@n—1) : E] is finite. If Fo is the field 
E(ao,@1,.--,@n—1), then c satisfies the polynomial f with coefficients in Fo; 


so c is algebraic over Fo, and [Fo(c) : Fo] is finite. Thus [Fo(c) : E] is finite, and 
c lies in Fo(c); so [E(c) : E] is finite, and c is algebraic over E. So, by definition, 
G is algebraic over E. 
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Proof of the Theorem Take any non-constant polynomial f € A[z]. By the 
Fundamental Theorem of Algebra, f has a root c € C. Then Alc] is algebraic 
over A, which is itself algebraic over Q by definition; so A[c] is algebraic over Q. 
But every complex number algebraic over Q is in A; so c € A. Thus f has a root 
in A, and we conclude that A is algebraically closed. 


6.7 Transcendental numbers. Nothing we have said so far allows us to 
conclude that the field of complex numbers is really different from the field A of 
algebraic numbers. If these two fields were the same, there would be no need to 
go through the construction of the real numbers by Cauchy sequences; everything 
could be obtained from Q by adjoining roots of polynomials. 

To lend an air of mysticism to the proceedings, we define a transcendental 
number to be a complex number which is not algebraic, that is, an element of 
C\ A. The question is: Do transcendental numbers exist? 

The answer is that they do; but there are three entirely different ways of reach- 
ing this conclusion. The first is to show that some very familiar number such as e 
(the base of natural logarithms) or 7 (the ratio of the circumference of a circle to 
its diameter) is transcendental. This was achieved for e by Hermite in 1873, and 
for 7 by Lindemann in 1882. (A modification of Hermite’s proof is given below, 
involving various simplifications.) The second approach, taken by Liouville in 
1844, was to write down a particular number which is easy to prove transcen- 
dental. The third, most revolutionary, approach is that of Cantor in 1874. He 
gave an argument which shows that ‘almost all’ numbers are transcendental, but 
without exhibiting even a single example! 


First proof: The transcendence of e 
Proposition 6.14  e is transcendental. 


Proof We assume, to the contrary, that e is algebraic, and let 
Ane” +--+ + aie + a9 = 0, 


where the coefficients a; are rational. Multiplying this equation by the least 

common multiple of the denominators, we may assume that the a; are integers. 

Furthermore, assuming we took the minimal polynomial of e, we have ap # 0. 
We let p be any prime number, and define the polynomial 


xP-1 (4 — 1)? (a — 2)? --- (x4 —n)P 
(p—1)! 


f(x) = 


Now f is a polynomial of degree np + p—1 with rational coefficients. Let f (x) 
be the polynomial obtained by differentiating f(a) i times. Note that f(x) =0 
for i > np+p. We also require the following property: 


For all i, and for j = 0,1,...,n, f(J) is an integer, and is divisible by p 
unless 7 =O andi=p-—1l. 
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The proof uses Leibniz’s rule for differentiating a product. Suppose that 7 4 0. 
To evaluate f(j), we add all of the terms obtained by writing i as a sum of n+1 
non-negative integers mo,..., Mn, differentiating the kth factor (x—k)? (or 2?~+ 
if k = 0) mz times, multiplying the product of the results by a suitable integer 
coefficient (a so-called multinomial coefficient), and dividing by the denominator 
(p — 1)! of f. If (a — 7)? is differentiated fewer than p times, there is a factor 
(z — j), which vanishes when we substitute « = j; if it is differentiated more 
than p times, then it is zero; and if it is differentiated exactly p times, then the 
result is p!, which leaves p on division by (p — 1)!. The contribution from all of 
the other factors multiplies this by an integer. So the result is a multiple of p. 

Now suppose that 7 = 0. Again, we only obtain a non-zero contribution if the 
first term x?~! is differentiated p — 1 times; the resulting term (p— 1)!, divided 
by the denominator (p — 1)!, leaves 1. Again, the effect of the other factors is to 
multiply the result by an integer. If any other factor is differentiated even once, 
there will be a factor of p in the product, and so the result is divisible by p. So 
only in the case when j = 0 and i = p—1 is the result not divisible by p (and it 
is definitely not divisible by p in that case). 

Let F(a) be the sum of f(a) and its derivatives of all orders. (As we saw, this 
is a finite sum.) Now F’(2x) is the sum of all the derivatives; so F’(x) — F(a) = 
—f (x). Now 


© (¢-#F(a)) =0-*(F (2) ~ F(a) = -e°*f(2) 


Hence, for 7 = 0,...,n, 
J 
a, | e * f(x) dx = a;F(0) — aje 4 F(¥). 
0 


Multiply this equation by e? and sum over j = 0,1,...,n, 


n 


> («je" if e~* f(a) a) = FO Dae = Ler) 


j=0 
n np+p—1 ; 
=-S5 DY) af). 
j=0 i=0 


Here we have used the supposed equation )7_, aje’ = 0 satisfied by e, together 
with the definition of F as the sum of f and its derivatives. 

This last equation is the key to the proof. We show that the left-hand side 
can be made arbitrarily close to zero by choosing p to be sufficiently large. We 
also show that the right-hand side is an integer not divisible by p, and hence 
has modulus at least 1. This is a contradiction, which completes the proof. So it 
remains to establish the two assertions. 

First, consider the left-hand side. For 0 < x < n, we have 


If(z)| <n"? /(p— VY}, 
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so that 
e Ee J prptp-1 
S- («e’ e~” f(x) ar) < S~ laze |f (p- 1 da 
j=0 j=0 P 
= Sige lJ a 
a. (@-1) 


and the last expression tends to zero as p — co. 

Now consider the right-hand side; call it R. Our observation about f shows 
that R is indeed an integer. Moreover, all terms except —ag f~) (0) are integers 
divisible by p. So R =) —aof'~!)(0). If we choose p greater than |ao| then, 
using the fact that a9 4 0 (see the start of the proof) and f~1)(0) is not 
divisible by p, we see that R is not divisible by p, as claimed. This completes 
the proof. 


This proof is taken (with thanks) from Ian Stewart’s book Galois Theory. 
In the same book, you will find an account of the proof that 7 is transcendental. 
This is similar but more complicated. (In the exercises after this chapter, Stew- 
art gives a number of ‘true or false?’ questions, one of which reads: ‘Sometimes 
the only way to prove a theorem is to pull rabbits out of hats.’ You may 
agree!) 


Second proof: Transcendence of Liouville’s number Liouville devised a simpler 
proof that transcendental numbers exist. He first showed that it is impossible 
to find very good rational approximations to irrational algebraic numbers (in a 
suitable sense). Then he wrote down a number which has very good rational 
approximations, and deduced that this number must be transcendental. 

Given any real number a, we can find rational numbers as close to it as we 
choose. The rational numbers with denominator q cover the real line with a gap 
of 1/q between two consecutive ones, so we can find one of these within 1/q of 
any real number. What makes a good rational approximation to a is that the 
difference between @ and the approximation p/q is much smaller than 1/q. 

This is the result about approximating algebraic numbers. 


Theorem 6.15 Let a be an algebraic number whose minimal polynomial has 
degree n. Then there is a constant c > 0 such that there are only finitely many 
rational numbers p/q satisfying |a — p/q| < 1/cq”. 


Proof Let f(x) be the minimal polynomial of a, the polynomial of least degree 
having a as a root. Multiplying by the least common multiple of the denom- 
inators of the coefficients of f, we may assume that all the coefficients are 
integers. 

Consider the derivative f’. This is a continuous function, and so it is bounded 
on any closed interval; say, |f’(a)| < c for x € [a-—1,a+]]. 
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Now there are only finitely many rational numbers p/q such that 1 < 
la — p/q| < 1/cq” (this can only hold if g < (1/c)!/"). So any other ratio- 
nal approximation p/q for which |a — p/q| < 1/cq” must lie in the interval 
[a — 1,a@+ 1]. We show that there are no such rationals. 

The Mean Value Theorem tells us that 


f(a) — (v/a) = (a — v/a f'(€), 


where € is some real number between a@ and p/gq. Now f(a) = 0, by assumption. 
Also f(p/q) # 0: for f is irreducible in Q[az] and has degree greater than 1, and so 
has no rational root. Since the coefficients of f are integers, f(p/q) is a rational 
number with denominator q”, so |f(p/q)| > 1/q”. Also, since € is in the interval 
[a—1,a+1], we have |f’(£)| <e. 

Putting all this together, we get ja — p/q| > 1/cq”. 


Liouville’s number is the real number with decimal expansion 
a = 0.110001000000000000000001 .. . 


(the ones occur in positions 1!, 2!, 3!, ...). In other words, 


a= S L077 
n=1 


Now a is irrational, since any rational number has a terminating or periodic 
decimal expansion. Let a,, be the nth partial sum of the series for a; that is, 
ig at 10-™. Then a, is a rational number p/g, where gq = 10™. Also, 
Q— An < 2-107 (+1)! = 2/q"*1, So, given n, and given any positive constant c, 
we have |a—a»,| < 1/cq?, for all sufficiently large m, where gq», is the denominator 
of the rational number a. So it is impossible that a satisfies a polynomial of 
degree n. Since this is true for all n, we see that a is transcendental. 


Third proof: Transcendence of almost all numbers In the late nineteenth cen- 
tury, Cantor was developing the concept of the cardinal number of elements 
in an infinite set, allowing him to compare the sizes of infinite sets. One of the 
triumphs of his theory is the following proof of the existence of transcendental 
numbers, which is technically much simpler than either of the other proofs. 


Definition An infinite set X is countable if there is is a bijection between X 
and the set N of natural numbers; that is, if the elements of X can be labelled 
as %,21,X2,... (indexed by the natural numbers). 


Among Cantor’s discoveries was the following result: 


Theorem 6.16 (a) The set A is countable. 
(b) The sets R and C are not countable. 
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Proof (a) For A, we have to list all the algebraic numbers in a sequence. 
Each algebraic number is the root of a polynomial f, and by multiplying by the 
least common multiple of the denominators we can assume that f has integer 
coefficients. Now each polynomial has only finitely many roots, so if we can list 
all the polynomials then we can list the algebraic numbers. 

To list the integer polynomials, we define the height of the non-constant 
polynomial a,2" + Gn_12"' +--+ + ag to be the positive integer n + |a,| + 
|@n—1| +--+: + |ao|. There are only a finite number of polynomials with any 
given height, and so we can list the roots of all the polynomials of height 2 (in 
fact, there are just two such polynomials, namely x and —<), then height 3, 
and so on. 

(We are using here an instance of a general principle, according to which the 
union of a countable number of finite sets is countable.) 

(b) We will show that the unit interval (0,1) C R is uncountable; the same 
assertion then follows for the larger set C. Here are two proofs: 


First proof This is Cantor’s famous ‘diagonal argument’. We represent the real 
numbers in the unit interval as decimals, which may be finite (terminating), or 
recurring, or neither. We assume that all the decimals are infinite, by appending 
zeros to the finite ones (so, for example, 1/2 = 0.50000...). 

Suppose that the unit interval is countable; that is, we can list all its mem- 
bers as 71,72,73,..-. Let r; have the decimal expansion 0.751 2;27;3..... We show 
that the assumption is wrong, by constructing a number s which is not in the 
list. We define s to be the number whose decimal expansion is 0.b,b2b3..., 
where 


b, = 7, if ay = 5, 


Now, by construction, s # r;, since the ith decimal digit of s is different from 
that of r;. So s is not equal to any number in the list, and the assumption that 
we have listed all the real numbers in (0,1) must be wrong. 


Second proof Again suppose that r,,72,73,... is a list of all the real numbers 
in (0,1). Take any positive real number ¢, and for n = 1,2,..., let I, be the 
interval of length €/2” with centre at the point r,. (Possibly this interval is 
not entirely contained in (0,1), but this does not matter.) Since r, € I,, our 
assumption that all the numbers in (0, 1) have been listed implies that the interval 
(0,1) is contained in the union of the intervals [,,. But (0,1) has length 1, while 
the union of the intervals [,, is at most > €/2” = e. (The length may be smaller 
since the intervals may overlap.) 


Since C is uncountable but A is countable, there is at least one transcen- 
dental number. But the proof gives no recipe to find one. (The first proof 
appears constructive: it shows that, given a list of all the algebraic numbers, if 
we knew the ith digit in the decimal expansion of the ith algebraic number in the 
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list, we could construct a transcendental number. But this is only a theoretical 
possibility. ) 

However, the proof gives more. We can see that the set of transcendental 
numbers, like the set of complex numbers, is uncountable. (If it were countable, 
and was enumerated as 61, bo, b3,..., then we could take a list of all the algebraic 
numbers as in part (a) of the theorem, say a1, a2,a3,..., and produce a list of 
all the complex numbers as 1, b1, a2, b2,a3,..., contrary to part (b).) 

The second proof shows a little more. The algebraic numbers in the unit inter- 
val, or indeed any countable set, can be covered by a union of intervals with total 
length less than any preassigned positive number e. So, if we choose a number 
at random from the unit interval, with probability 1 it will be transcendental. 
(In the language of measure theory, we say that the algebraic numbers form a 
null set.) 


Remark The similar fact that the rational numbers form a null set seems 
to have been known to the remarkable fourteenth-century French mathemati- 
cian Nicole Oresme. In his work De proportionibus proportionum, he states the 
proposition 


It is probable that two proposed unknown ratios are incommensurable. 


On the basis of this and related ideas he argued that the future is unpre- 
dictable and hence that astrology is futile. See Crosby’s book cited earlier, or 
Karl Petersen’s Ergodic Theory, for more information about Oresme. 


6.8 Ruler-and-compass constructions. The topic of ruler-and-compass 
constructions has fascinated mathematicians since the days of the ancient Greeks. 
The three famous problems (trisecting a general angle, duplicating a cube, and 
squaring a circle), are all now known to be impossible; but this has not halted 
the steady stream of ‘solutions’ which arrive at mathematics departments all 
over the world! 

The general set-up can be described as follows: We are given a finite set S 
of points in the Euclidean plane, and wish to construct another finite set T’. To 
perform the construction, we are allowed two tools: 


e a ruler or straightedge, which can be used to draw a line of arbitrary 
length through any two points already constructed; 

e a compass, which can be used to draw a circle whose centre is any con- 
structed point, and whose radius is equal to the distance between any two 
constructed points. 


New points are constructed as the intersections of two lines, of a line and a circle, 
or of two circles. The construction process is required to take only finitely many 
steps. 

The set S should contain at least two points, since otherwise nothing can 
be constructed. It is conventional to assume that two of the points in S' are the 
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origin (0,0) and the point (1,0). (This convention simply sets the position and 
scale of the coordinate axes.) 

The works of Euclid, and the versions of them which have been fed to many 
generations of schoolchildren in the past, contain the details of many construc- 
tions: bisecting a line segment or an angle, constructing an equilateral triangle, 
a square, or regular pentagon, dividing a line segment in the golden ratio, and so 
on. But we show that certain things cannot be constructed by giving a necessary 
condition. 

If S is a finite set of points in R?, we define Q(S) to be the field generated by 
the coordinates of all points in S$; that is, the smallest field containing all these 
coordinates. 


Theorem 6.17 I[fT is constructible from S, then the coordinates of the points 
in T lie in a field F’ containing Q(S) such that [F : Q(S)] is a power of 2. 


Proof We look at one step in the construction, where one new point is cre- 
ated, and show that its coordinates lie in an extension of the preceding field 
with degree at most 2. Then, by the multiplicative formula for degrees of 
extensions (Theorem 6.10), the final field has degree a power of 2 over the 
initial one. 

Suppose that we have constructed a set U of points. Their coordinates lie in 
the field E = Q(U). Now we may draw a line joining two of these points, or a 
circle with centre in U and radius equal to the distance between two points in U. 
Coordinate geometry tells us how to find the equation of such a line or circle; 
we claim that all coordinates in such an equation belong to FE. For: 


(a) The line joining (a,b) to (c,d) has equation (a — a)(d — b) = (y— b)(c—a). 
(b) The circle with centre (a, b) and radius r has equation (2—a)?+(y—b)? = r?. 
If r is the distance between (c,d) and (e, f), then r? = (c— e)? + (d— f)?. 


Now suppose that we construct a new point as the intersection of two such 
curves. 


(a) If both curves are lines, then we have to solve two linear equations with 
coefficients in E; the solution lies in E. 

(b) If one is a line and the other a circle, we can use the equation of the line 
to find y in terms of x (or vice versa), and substitute into the equation of 
the circle; we obtain a quadratic equation for x (or y). If the quadratic is 
reducible in Ea], the solution lies in E; otherwise, it lies in an extension 
E’ of E with [E’ : E] = 2. Once x has been found, the equation of the line 
yields y without further enlarging the field. 

(c) Suppose that both curves are circles, with equations 77+y?+ax+by+c = 0 
and «? + y?+dx+ey+f =0. Any solution must also satisfy the difference 
of these equations, namely (a — d)a + (b— e)y+ (c— f) = 0. This is the 
equation of a line; so we are back in the situation of case (b). 


The theorem is proved. 
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We use this result to show that the classical problems are insoluble. 


Duplicating the cube This problem gives the side of a cube of volume 1, 
and asks for the side of a cube of volume 2 to be constructed. We may take 
S = {(0,0), (1,0)}, so that Q(S) = Q If the required length can be constructed, 
then we can construct the point (2!/3,0), and so the final field F contains Q(2'/8). 

However, the polynomial x? — 2 is irreducible over Q, since 2!/° is irrational. 
So [Q(2'/3) : Q| = 3. Since [F : Q] = 2” for some n, we conclude that [F : 
Q(2!/3)| = 2”/3, an obvious impossibility. 


Trisecting the angle What is required in this problem is a general procedure 
to trisect any angle. This can be refuted by showing just one angle that cannot 
be trisected. (Some angles can be trisected, for example, a right angle: we can 
construct an angle of 30° by bisecting the angle of an equilateral triangle.) 

We show that the angle 60° cannot be trisected. 

Note that, if an angle 6 can be constructed, then the length cos@ is 
constructible. Let c = cos 20°. Then 


4c — 3c = cos60° = 5, 


using the formula cos 30 = 4cos? @ — 3cos@. So ¢ is a root of the polynomial 
823 —6x2—1 = 0. This polynomial is easily seen to be irreducible. Then the proof 
continues exactly as for duplicating the cube. 


Squaring the circle The problem is to construct a square whose area is equal 
to that of a circle of unit radius; in other words, to construct a length of \/7. 
Since 7 is transcendental, by the result of Lindemann, it cannot lie in any exten- 
sion of Q of finite degree, whether a power of 2 or not. So the construction is 
impossible. 


It may be the fact that this argument uses the relatively complicated proof of 
the transcendence of 7 that leads to the existence of an army of ‘circle-squarers’ 
who do not accept it. 


Exercise 6.7 Prove that, if a is transcendental over E, then E(a) is isomorphic to the 
field of fractions of the polynomial ring E[z]. 


Exercise 6.8 Prove that it is not possible to construct a regular 7-gon with ruler 
and compass. [Hint: The real number a = 2cos(27/7) satisfies the cubic equation 
eta? —2¢—-1=0] 


More about sets 


We used some arguments about the cardinalities of infinte sets in the preced- 
ing section. This chapter finishes will a general account of Cantor’s theory of 
cardinality, and a look at the Axiom of Choice. 
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6.9 Cardinality. We used some basic notions of Cantor’s theory of cardinal 
number to prove that almost all numbers are transcendental. Here is a more 
detailed account, mostly without proofs. 

We constructed each natural number as a set: the number n is the set 
{0,1,...,n — 1}. We say that a set A has cardinality n if there is a bijec- 
tion between A and n. (This is well defined; there cannot be a bijection between 
two different natural numbers.) We say that A is finite if it has cardinality n for 
some natural number n, and infinite otherwise. Furthermore, A is countable 
if there is a bijection between A and N, and is uncountable if it is infinite but 
not countable. 

One of the foundations of the theory is the Schroéder—Bernstein theorem: 


Theorem 6.18 (Schréder—Bernstein Theorem) [/f there are injective maps 
from A to B and from B to A, then there is a bijection between A and B. 


Using this, we can write |A| = |B| if there is a bijection from A to B; |A| < |B| 
if there is an injective map from A to B; and |A| < |B if |A| < |B] but there is 
no bijection between them. [We have not defined the cardinality of a set, simply 
these three relations between pairs of sets.] Clearly, if A C B, then |A| < |B]. 
Now the Schréder—Bernstein theorem states that, if |A] < |B] and |B] < |A|, 
then |A| = |B|. 

An example of Cantor’s diagonal argument is the following result, where P(A) 
is the power set of A, the set of all subsets of A. 


Theorem 6.19 |A| < |P(A)|. 


Proof We can define an injection from A to P(A) by mapping the element 
a € A to the set {a}. 
Suppose that F is a bijection between A and P(A). Let 


B={aceA:a¢€ Fla}. 


Then B € P(A), so by assumption B = F'(b) for some b € A. Now we ask: is 
b € B? If so, then by definition of B we have b ¢ f(b) = B; while if not, then 
b € f(b) = B. So either assumption leads to a contradiction. Hence no such 
bijection can exist. 


So the cardinal numbers of sets go on for ever; there is no largest set! 
Some of Cantor’s discoveries are summarised in the following theorem: 


Theorem 6.20 (a) The union and Cartesian product of countable sets are 
countable. 
(b) For any natural number n, |N"| = |N]. 
(c) A subset of a countable set is finite or countable. 
(d) |Q| = |Z| = |N|, so Z and Q are countable. 
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(e) |R| = |P(N)| > |N|, so R is uncountable. 
(f) |C| = [RI. 


In general, there may be sets A and B for which neither |A| < |B] nor 
|B| < |A| holds; that is, there may be no injective map in either direction between 
A and B. A set-theoretical principle called the Axiom of Choice ensures that 
any two sets are comparable. We discuss this principle further in the next section. 
It is independent of the other axioms for set theory; that is, it can be neither 
proved nor disproved from them. 

Cantor posed the famous Continuum Hypothesis, according to which 
there does not exist a set A satisfying |N| < |A| < |R|; in other words, |R| 
is the next cardinal number after |N|. This also turns out to be independent of 
the other axioms (even including the Axiom of Choice). 


6.10 The Axiom of Choice. This section contains a self-contained account 
of the Axiom of Choice and its best-known application in algebra. We will use 
this in the next chapter. 

A family of sets is a collection (X; : i € I) of sets, where J is an index set. 
Formally, we can regard it as a function F’' whose domain is the index set I, with 
F(i) = X; for all i € I. A choice function for the family is a function f whose 
domain is the index set I, satisfying f(i) € X; for alli € I. Informally, f chooses 
a member of each set X;. 

Of course, for a choice function to exist, it is necessary that each set X; 
should be non-empty. The Axiom of Choice asserts that this condition is also 
sufficient: 


Any family of non-empty sets has a choice function. 


The Axiom of Choice cannot be proved or disproved from the other axioms 
in a standard list of axioms for set theory, such as those of Zermelo and Fraenkel. 
Note, however, that we do not need to invoke it to choose an element from a 
single non-empty set, or even to choose elements from finitely many non-empty 
sets; the other axioms justify doing this. 

Bertrand Russell’s explanation shows what is going on here. Suppose that 
you have a wardrobe containing infinitely many pairs of shoes. Can you choose 
one shoe from each pair? Yes, just choose the left shoes. But if, instead, you 
have infinitely many pairs of socks, then can you choose one from each pair? The 
Axiom of Choice asserts that such a selection exists, even if (as here) there is no 
rule for doing it. 

It is this non-constructive nature of the Axiom of Choice which makes its 
use somewhat controversial. Most mathematicians accept it, but perhaps more 
because they cannot do without its remarkable consequences than because of 
any philosophical reason. 

We abbreviate the Axiom of Choice to AC. 
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We now discuss two equivalent principles. First, some definitions: 


Definitions A partial order on a set A is a reflexive, antisymmetric, and 
transitive relation on A, usually written as <. In other words, 


eifa<bandb<cthena<c; 
e a<band b< aif and only ifa=b. 


We write a < b to mean a < b and a F¥ Bb; and b > a means the same as a < 0b. 
The relation is a total order if in addition the trichotomy law holds: 


e for any a and 8, exactly one of a < b, a= b, a > b holds. 


A chain in a partially ordered set is a subset which is totally ordered by the 
relation. An upper bound for a subset B is an element a satisfying b < a for 
allb € B. A maximal element of A is an element a € A satisfying a ¢ b for all 
b € A. The terms lower bound and minimal element are defined similarly. 
A maximal or minimal element of a chain is usually called a greatest element 
or least element respectively. 

A well-order of A is a total order with the property that every non-empty 
subset has a least element. (Apologies for the ugly grammar, which is a back- 
formation from ‘well-ordered set’.) 


Theorem 6.21 The following statements are equivalent: 


(Aziom of Choice, AC) Any family of non-empty sets has a choice function. 

(Zorn’s Lemma, ZL) If a partial order has the property that every chain has an 
upper bound, then there exists a maximal element. 

(Well-ordering Principle, WO) Every set has a well-order. 


Proof (AC) implies (ZL): The idea of the proof is that, in order to find a 
maximal element in a partial order, we start anywhere, and move upwards until 
we come to one. This obviously works in a finite set; for an infinite set, more 
care is required. 

Assume AC, and let (A, <) be a partially ordered set in which every chain 
has an upper bound. Consider the family of upper bounds of chains, and let f 
be a choice function for it; that is, for any chain C, f(C) is an upper bound 
for C. Assume, for a contradiction, that A has no maximal element. Then, for 
any a € A, the set of elements greater than a is non-empty; again by AC, we 
may let g(a) be an element greater than a. 

Construct a chain B as follows. Start by including any element a € A in B. 
At any stage, if b is the greatest element of B so far, add g(b) to B; if B has 
no greatest element, add f(B) to B. Each move retains the property that B is 
a chain. 

Now, by assumption, the chain B resulting from this construction has an 
upper bound. But this is a contradiction, since if B has an upper bound b, we 
can add either g(b) (if b € B) or f(B) to it, so we had not finished. 
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(ZL) implies (WO): Take a set A which we wish to well-order. Let X be the 
set X of ordered pairs B,<g), for which B is a subset of A and <g is a well- 
order on B. Now define a relation < on X by the rule that (B,<g) x (C,<c) if 
BCC and <z is the restriction of <c to B. Check that ~ is a partial order on 
X. Check also that every chain in (X, <) has an upper bound (take the union of 
all the sets and orderings in the chain). Assuming (ZL), X has a greatest element, 
say (B,<p). Now we must have B = A; for if not, choose an element a ¢ B and 
define it to be greater than all elements of B to obtain a larger member of X. 
So <4 is a well-order on A. 

(WO) implies (AC): We are given a family (X; : 7 € I) of non-empty sets. By 
(WO), there is a well-order of their union U,-; Xi. Now define f(z) to be the least 
element of X; with respect to this well-order; then f is a choice function. 


Remark Convince yourself of the non-constructive nature of AC by trying to 
construct a well-order of R. 


Here is a typical (and very important) application of AC to algebra. 


Theorem 6.22 (Krull’s Theorem) Assume AC. Then every ring with 
identity has a maximal ideal. 


Proof Let R be a ring with identity. Let A be the set of proper ideals of R. 
The relation < on A defined by I < J if I C J is a partial order. 

Suppose that B is a chain in A, and let K be the union of the ideals in B. 
We claim that K € A; that is, K is a proper ideal of R. 


e Ifa,b € K, then a € I and 6 € J for some J,J € B. Since B is a chain, 
then I C J or J CI holds; suppose the former. Then a,b € J, soa—be J. 
Hence a—beE K. 

Ifa € K andr € R, then a € I for some i € B. Then ar,ra € I, so 
ar,ra € K. Thus K passes the Ideal Test. 

To show that K is a proper ideal, suppose for a contradiction that kK = R. 
Then 1 € K,so1€/TJ for some J € B. But this is impossible, since members 
of B are proper ideals by definition. (This is the only point where we use 
the fact that R has an identity.) 


By Zorn’s Lemma, A has a maximal element; that is, R has a maximal 
(proper) ideal. 


Remark Wilfrid Hodges has shown that the converse holds; the conclusion of 
Krull’s Theorem is equivalent to the Axiom of Choice. 


The Well-ordering Principle gives us a new proof technique: transfinite 
induction. Let < be a well-ordering of a non-empty set A. Then A has a least 
element, which we will call 0. Moreover, if a is an element of A which is not the 
greatest element, then the set of elements greater than A has a least element 
s(a), called the successor of a. 
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Now suppose that B is a subset of A. Assume that 


e 0 E B; 

e if a € B, then s(a) € B; 

e if b#0, b is not a successor, and every element smaller than b is in B, then 
bE B. 


Then we can conclude that B = A. For if not, let c be the smallest element of 
A\ B. Then c cannot be 0; it cannot be a successor; and then the final clause 
shows that it cannot exist. 

Here is an application, promised in the preceding subsection. 


Proposition 6.23 Assume AC. Then for any two sets A and B, either there 
is an injective map from A to B, or there is an injective map from B to A. In 
other words, either |A| < |B, or |B| < |A|. 


Proof We may assume that both A and B are well-ordered. We attempt to 
define a map f : A — B as follows: 


¢ f(04) = 0B; 

e if f(a) = 8, then f(s(a)) = s(b); 

e if a £0 and a is not a successor, then f(a) is the least element of B not of 
the form f(a’) for any a’ <a. 


If we succeed in defining f, then it is an injective map from A to B. If we fail, 
it is because at a certain point we have used up all the elements of B; then we 
have an injective map from a subset of A onto B, whose inverse is an injective 
map from B to A. 


Exercise 6.9 Prove Theorem 6.20. 


Exercise 6.10 Assuming AC, show that any infinite set contains a countable subset. 
[Hint: Take a choice function f for the family of non-empty subsets of A. Now 
define by induction ao = f(A) and 


an = f(A\ {ao,---,a@n—1}) 


for all positive integers n.] 


Exercise 6.11 Assume AC. Show that, if R is a ring with identity, and J is a proper 
ideal of R, then J is contained in a maximal ideal of R. 


Exercise 6.12 Assume AC. Show that any Boolean ring R is isomorphic to a subring 
of the ring of subsets of some set X. 
[Hint: Take X to be the set of all maximal ideals of R.] 


Exercise 6.13 A subset B of an arbitrary vector space V is a basis if every finite 
subset of B is linearly independent, and every vector of V is in the span of some finite 
subset of B. Assuming AC, show that every vector space has a basis. 
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Exercise 6.14 Assume AC. Prove that there exists a discontinuous function f : R— R 
satisfying 


f(a+y) = f(x) +f) 


for allz,y € R. Hint: Show first that every continuous solution to the displayed equation 
has the form f(x) = cx for some real number c. Now regard R as a vector space over 
the field Q. Show that the displayed equation is equivalent to the linearity of f, and 
use a basis for the vector space to construct a discontinuous solution. 


¢ Further topics 


In this chapter, we delve a little further into groups, rings, and fields, and examine 
some other algebraic systems which have been studied. 


Further group theory 


The emphasis in this section is on finite groups. We prove Sylow’s theorems on the 
existence of subgroups of prime power order, and the Jordan—Holder Theorem, 
which reduces the problem of describing all finite groups to describing the finite 
simple groups and fitting them together. There is also some discussion of these 
two sub-problems. 


7.1 Permutation groups and group actions. As we already know, a per- 
mutation group is a group whose elements are permutations; that is, a subgroup 
of a symmetric group. It is more in keeping with the spirit of abstract algebra 
that we should not tie down the elements of a group to being permutations. 
Accordingly, we define an action of a group G on a set 92. This will asso- 
ciate to every group element a permutation, so that the permutations arising 
will form a permutation group. But we do not require that the correspondence 
between group elements and permutations is one-to-one. The formal definition 
is as follows: 

An action of a group G on a set 2 is a function wp: 2 x G > 2 with the 
following two properties: 


(GA1) p(u(a, 9), h) = (a2, gh) for alla € 0, g, hE G. 
(GA2) u(x, 1) = x for all x € Q, where 1 is the identity of G. 


These axioms are obviously related to the closure and identity laws for the group 
G. You might have expected an axiom corresponding to the inverse law, 


(GA3) u(u(2,9),97') = w(u(z,9~*), 9) = @ for alle € 0, 9 € G; 


but in fact this follows from (GA1) and (GA2) (Exercise 7.1). 
You should think of pu(x,g) as the image of x under the permutation of Q 
corresponding to g. The next result guarantees that it does indeed work like that. 


Proposition 7.1 (a) For any g € G, the map m4: Q — Q defined by amg = 
p(x, g) is a permutation. 
(b) The map 6: G — Sym(Q) defined by g@ = 1, is a homomorphism. 


(c) Conversely, given any homomorphism @: G — Sym(Q), there is an 
action of G on Q given by u(x, g) = x(g8). 
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Proof (a) The ‘derived axiom’ (GA3) says that the functions 7, and 7,-1 are 
inverses of one another. A function that has an inverse is a permutation. 
(b) This says that tn = Tg7n, which is the content of (GA1). 
(c) Straightforward checking shows this. 


Examples We will use three examples in which the action is derived from the 
abstract group structure. In each case the axioms (GA1) and (GA2) are easily 
checked. 


(a) Let H be a subgroup of G. Let 2 be the set of all right cosets of 
H in G. Define an action by y(Ha,g) = H(ag). This is the action of right 
multiplication. 


(b) Define an action of G on itself (that is, Q = G) by the rule u(z,g) = 
g ‘ag. This is the action of conjugation. 


(c) Let Q be the set of all subgroups of G. Then G acts on 2 by conjugation: 
u(H, 9) = gH. 


Now we develop a little of the theory of group actions. 

Let « be an action of G on Q. Define a relation ~ on 2 by the rule that 
x ~ y if there exists g € G with pu(a2,g) = y. The reflexive, symmetric, and 
transitive laws for ~ follow almost immediately from the axioms (GA2), (GA3), 
and (GA1) for an action (in other words, from the properties of identity, inverses, 
and closure). So ~ is an equivalence relation. Its equivalence classes are called 
orbits. So x and y lie in the same orbit if the permutation corresponding to 
some element of G carries x to y. The set 2. decomposes into a disjoint union of 
orbits. 

We say that the action is transitive if there is just one orbit, and intran- 
sitive otherwise. In our examples, the action of G by right multiplication on 
the right coset space is transitive, whereas (if G # {1}) the action of G on 
itself by conjugation is not; the orbits for the latter action are the conjugacy 
classes of G. 

The stabiliser of an element x € (2 is the set 


{g€G: u(2x,g) =z} 


of elements of G for which the corresponding permutation fixes x. It is 
denoted Gy. 


Theorem 7.2 (Orbit—Stabiliser Theorem) Given an action of G on Q, 
and xe: 


(a) the stabiliser Gy is a subgroup of G; 
(b) there is a bijection between the orbit of x and the set of right cosets of Gz 
in G. 
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Proof (a) Apply the subgroup test: the composition or inverse of permutations 
fixing x fixes x. (Argue formally with the action jy if you prefer.) 
(b) We show that, for any y belonging to the orbit of x, the set 


X(z,y) = {9 € G: uz, g) = y} 


is a right coset of H = Gz, and every right coset arises in this way. First, since y 
lies in the orbit of x, there is an element of g such that u(x, g) = y. Then every 
element of the right coset Hg maps x to y. Conversely, if g’ maps x to y, then 
gg’ fixes x, so lies in G, = H; then Hg! = Hg. So the set X(a,y) is a right 
coset of G,. Conversely, every right coset G,g is contained in (and hence equal 
to) X(x, u(x, g)). 


Remark If G is finite, the size of the orbit of x is equal to |G: G,| = |G|/|Gz|. 


In our examples, 

(a) In the action of G on the right cosets of H by right multiplication, the 
stabiliser of the coset H is the subgroup H. Show that the stabiliser of the coset 
Hx is x~!Hzx. The proposition is clear in this case. 

(b) In the action of G by conjugation, the stabiliser of x is its centraliser 
Ce(x), and we recover the formula |G : Ce(x)| for the size of the conjugacy 
class. 

(c) In the action of G by conjugation on its subgroups, the stabiliser of a 
subgroup H is its normaliser 


No(H) = {9 € G: g°'Hg = H}. 


This subgroup contains H, and indeed it is the largest subgroup of G in which 
HA is contained as a normal subgroup. If H is a normal subgroup of G, then 
No(H)=G. 


Remark It can be shown that, with a suitably defined notion of isomorphism 
of actions, 


(a) every transitive action is isomorphic to the action by right multiplication 
on the right cosets of a subgroup; 

(b) the actions on the right cosets of two subgroups H and K are isomorphic 
if and only if H and K are conjugate. 


This gives us a complete classification of the transitive actions, and hence of 
arbitrary actions, of a given group G; we just have to classify the subgroups up 
to conjugacy. 


Using the Orbit—Stabiliser Theorem, we can give a formula for the number 
of orbits of a finite group acting on a finite set. If G acts on Q, let fix(g) denote 
the number of elements w € 2 which satisfy u(w,g) = w (that is, the number of 
points fixed by G). 
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Theorem 7.3 (Orbit—Counting Lemma) The number of orbits of G on Q 
is given by the formula 


Gl S- fix(g). 


gEG 


Proof We count in two ways the number of pairs (w, g) for which w € D, 9g € G, 
and g fixes w. 
On one hand, g fixes fix(g) points, so the number of pairs is 


S- fix(g). 


gEG 


On the other hand, take w € 2. The number of group elements fixing w is the 
order |G,| of its stabiliser. Let O be the orbit containing 2. Now every point of 
O has stabiliser of the same order as w, namely |G.,|; so the number of pairs with 
w € O is |O|-|G,| = |G|, by the Orbit—Stabiliser Theorem. That is, every orbit 
contributes |G] pairs to the sum; so the total number of pairs is |G] multiplied 
by the number of orbits. 

Equating the two values and dividing by |G| gives the result. 


Example How many ways are there of painting the faces of a cube with three 
colours (say red, green, and blue), if two colourings differing by a rotation are 
identified? 

A colouring is a function from the six faces of the cube to the set 
{red, green, blue} of colours; so there are 3° colourings. Let be the set of these 
colourings. We are asked to count the number of orbits on 2 of the group G of 
rotations of the cube. 

The group G has order 24 and consists of the following elements: 


) The identity; 

) three rotations through 180° about axes through face centres; 
) six rotations through +90° about axes through face centres; 
) 
) 


(a 
(b 
(c 
d 


—~ 


eight rotations through +120° about axes through vertices; 
six rotations through 180° about axes through midpoints of edges. 


— 


( 


Type (a) fixes all 3° colourings. For any other type, a colouring is fixed if 
and only if faces in the same cycle get the same colour, so the number of fixed 
colourings will be 3°, where c is the number of cycles of the rotation acting on the 
faces of the cube. These numbers are 4, 3, 2, and 3 in cases (b)—(e) respectively. 

So the number of orbits is 


1 


Sf eS POS ee Oss) SbF. 


We deduce from the Orbit-Counting Lemma a simple but useful result of 
Jordan. 
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Corollary 7.4 (Jordan’s Theorem) (a) Let G act transitively on the finite 
set Q, where |Q| > 1. Then there is an element of G which fixes no point 
of Q. 
(b) Let H be a proper subgroup of a finite group G. Then 


U g Hg 4G. 


gEG 


Proof (a) By the Orbit-Counting Lemma, the average number of fixed points 
of elements of G is equal to 1 (the number of orbits). The identity fixes more 
than one point; so some element must fix less than one point. 

(b) Let G act on the set of right cosets of H by right multiplication. The action 
is transitive. The point stabilisers are the conjugates g-'Hg. So the element 
guaranteed by (a) lies in none of these conjugates. 


7.2  Sylow’s Theorem. We next prove what is arguably the most important 
theorem about finite groups, Sylow’s Theorem. It is motivated by the question: 
Does Lagrange’s Theorem have a converse? Lagrange’s Theorem asserts that 
the order of any subgroup of G divides the order of G. But not every divisor 
occurs. The alternating group A, has order 12 but has no subgroup of order 6 
(Exercise 3.36). 


Theorem 7.5 (Sylow’s Theorem) Let G be a group of order n = p*m, where 
p is prime and p does not divide m. 


(a) G contains a subgroup of order p*. 

(b) The number of subgroups of order p* is congruent to 1 mod p, and all these 
subgroups are conjugate. 

(c) Any subgroup of G of order a power of p is contained in a subgroup of 
order p%. 


Proof The proof involves the ideas of group actions. 
For (a), let Q be the set of all subsets of G of cardinality p*. This is a very 


large set, of cardinality (’ a We define an action uw of G on Q by ‘right 
p 


multiplication’: 
W(X, g) =Xg = {xg: cE X}. 


Now 0 splits into orbits for this action. We note that the sets making up any 
orbit must cover the whole of G: for, if « € X, then g € u(X,x~1g). So the size 
of the orbit is at least n/p* = m, with equality if and only if the stabiliser is a 
subgroup of order p*. Conversely, if X is a subgroup of order p*, then the orbit 
of X consists of the right cosets of X, of which there are just m. If the size of an 
orbit is larger than m, then (as it divides p*m) it must be a multiple of p. 
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So there are two kinds of orbits: 


(a) orbits of size m, whose stabilisers have order p*; 
(b) orbits of size divisible by p. 


If we can show that the size of Q is not divisible by p, then we can conclude that 
orbits of the first type exist, and hence that G has subgroups of order p*. 


a 
m 
Now the size of 2 is © , }- lt is possible to show, using some number theory, 


that this number is not divisible by p. But this can be done with a trick. The 
size of 2 is completely independent of which group of order n we have chosen. 
So consider the cyclic group of order n. We know that it has a unique subgroup 
of order p*%, hence has just one orbit of type (a). It follows that 


we = 
pe ypor™ 
so this number is not divisible by p. Thus part (a) of the Theorem is proved. 

To prove parts (b) and (c), we take a different action. Let 2 be the set of all 
subgroups of G of order p* (we know by (a) that this set is non-empty), and let 
G act on 2 by conjugation: u(X,g) = 97! Xg. 

Let Q be any non-trivial subgroup of G whose order is a power of p. We 
consider the action of Q on 2 obtained by restricting the action of G. 

Suppose first that |Q| = p%, so that Q is one of the members of 2. Clearly Q 
fixes itself, and so lies in an orbit of size 1. We claim that @ fixes no other member 
of Q. If Q fixes a different subgroup X in this action, then by Exercise 7.3, we 
have that QX is a subgroup, and 


|QX| = |Q|-|X|/|Q0 X| = p*- p/p? = p**, 


where |Q.M X| = p?. But since the subgroups Q and X are different, their 
intersection is a proper subgroup of each, and so b < a, whence 2a — b> a. But 
this is impossible, since no higher power of p than p* divides |G| = p*m. 

So all the other orbits of Q have sizes which are greater than 1 but divide 
|Q| = p*, and hence are divisible by p. It follows that |Q| =, 1. 

What about the orbits of G? These are obtained by glueing together orbits of 
Q@ in some way. So the orbit containing @ has size congruent to 1 mod p, and all 
the others have size congruent to 0 mod p. Could there be more than one orbit? 
If P lies in a different orbit to Q, then the same argument would show that the 
orbit of P has size congruent to 1 mod p and the others 0 mod p, which is clearly 
impossible. So there is only one orbit for the action of G by conjugation; in other 
words, all subgroups of order p* are conjugate. Thus, (b) is proved. 

Finally, consider (c). Let Q have order p’, where 0 < b < a. Every orbit of Q 
on 2 has size dividing |Q|, and hence a power of p. Since |Q| =, 1, at least one 
orbit must have size 1. If this orbit is {P}, then as above we find that PNQ = Q, 
whence @ C P. 
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7.3 p-groups. Ifa group has prime power order, it has a number of special 
properties not shared by arbitrary groups. These will be described later in this 
section. I have chosen to give a composite result, which proves the basic property 
of such groups at the same time as proving the first part of Sylow’s Theorem. 
First, we prove Cauchy’s Theorem. 


Theorem 7.6 (Cauchy’s Theorem) Jf a prime number p divides the order 
of a group G, then G contains an element of order p. 


Proof Let 


Q = {(91,---5 9p) € GP: g1- ++ Gp = 1}. 


We define an action of the cyclic group of order p on 2 by 


(915--+59p)™ = (Gps G1,--+59p—1) 


where 7 generates C, (that is, a shifts the coordinates cyclically). We have to 
check that 7 maps 7 to itself. If (g1,...,gp) € Q, then gy = (g1--+Gp—1)~', so 
IpG1*** Jp—1 = 1 as required. 

Now Cy has orbits of size 1 and p on Q. An orbit of size 1 contains an element 
(9,9,---,g) of Q, where g? = 1; any other element of 2) lies in an orbit of size p. 
Since |Q| = |G|?~+ is divisible by p, the number of orbits of size 1 (and hence 
the number of solutions of g? = 1) is also a multiple of p. One of these solutions 
is g = 1, so there must be at least p—1 more; these are elements of order p. 


The principle used here states: 


A group of p-power order, acting on a set of size divisible by p, 
has the property that the number of fixed points is divisible by p; 
hence, if there is at least one fixed point, then there are at least p. 


We use this in the next proof. 


Theorem 7.7 Let G be a group of order p*m, where p is a prime not 
dividing m. Then, for0O<i<a, 


A;: G contains a subgroup of order p'; 
B;: aft <m, then any subgroup of order p’ is contained normally in a subgroup 
of order p'*+. 


Proof The argument is an induction: we show that 


Ao Bo Ay net Ba-1 Aa 


(the last statement B, is vacuous). In other words, we have to start the induction 
by showing Ag, and the inductive step has two parts: A; > B; and B; > Aj41. 
Statement Apo is easy: the identity subgroup will do. 
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Suppose that A; is true, with i < a. Let P be a subgroup of order p’, and 
consider the action of P by right multiplication on its own right cosets. The 
number of cosets is p*~'m, a multiple of p; since P has p-power order and fixes 
itself, it must fix another coset, say Px. Now the statement Pxrg = Px for all 
g € P shows that x belongs to the normaliser of P. So [N,(P) : P] is equal to 
the number of fixed points of P in this action, hence divisible by p. 

By Cauchy’s Theorem, Ng(P)/P contains a subgroup Q of order p. This 
subgroup corresponds to a subgroup Q of N@(P) of order p|P| = p**! in which 
P is normal. 

Finally, it is trivial that B; implies A;,1 for 7 < a. So our inductive proof is 
complete. 


Corollary 7.8 (a) (First part of Sylow’s Theorem) A group of order p*m, 
where p is a prime not dividing m, contains subgroups of order p*. 

(b) A group P of prime power order p* has the property that any proper 
subgroup is properly contained in its normaliser; so there is a chain 


P<P<-:-<R=P 
of subgroups, where |P;| = p' and each is a normal subgroup of the nect. 


The last part can be strengthened. We use the term p-group for a group 
whose order is a power of a prime p. 


Theorem 7.9 (a) The centre of a non-trivial p-group is non-trivial. 
(b) If |P| = p*, then P has a chain 


P<Pi<::-<R=P 


of subgroups, where |P;| = p* and each is a normal subgroup of P. Moreover, 
Py41/P; = Cp. 


Proof Let P act on itself by conjugation. By our general principle stated ear- 
lier, the number of fixed points (which is the order of the centre Z(P)) is greater 
than 1. This proves (a). For (b), we proceed by induction. We take Po = {1}, 
and P; the subgroup generated by an element of order p in Z(P). (Every sub- 
group of Z(P) is normal in P.) If we have constructed P;, then we take Q to 
be a normal subgroup of order p in P/P;, and let P;,, be the corresponding 
subgroup of P. 


7.4 The Jordan—Holder Theorem. In the remainder of this section, we 
examine the structure of finite groups. It is not possible to give the kind of 
description we gave for finite fields, where there is a unique field of each prime 
power order; groups are much more complicated. First, we prove the Jordan— 
Holder Theorem, according to which any finite group is built from a unique 
collection of simple groups. (A group G is simple if it is not the identity group 
but its only normal subgroups are the trivial ones, the whole group and the 
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identity.) This breaks the problem of describing groups into two parts: describing 
the simple groups, and describing how they can be fitted together. The first part 
has been completed, but the length and complexity of the proof have meant that 
no self-contained account has yet appeared. The second problem is fairly well 
understood, but we do not have a complete ‘solution’ to it. 

Let G be a finite group. It is possible to choose a normal subgroup G, of G 
which is maximal; that is, which is contained in no normal subgroup of G except 
for itself and G. By the Second Isomorphism Theorem, the normal subgroups of 
G/G, correspond to subgroups of G containing G; hence there are just two of 
them, namely G/G, and the identity. In other words, G/G, is simple. Now we 
repeat the procedure with G,. We end up with a sequence 


GHGh SG, SCs Sc GE}, 


with the properties that, for i = 1,2,...,r, we have G; J Gj_1 and G;_1/G; 
is simple. The series displayed is called a composition series for G, and the 
simple groups G;_1/G; are called the composition factors of G. 

Note that we obtain a list of composition factors (so that the same simple 
group may occur more than once), and also that the list of composition factors is 
associated with a particular composition series. Note also that the product of the 
orders of the composition factors of G is equal to the order of G. Furthermore, 
given any descending series of subgroups, each normal in the preceding one, we 
can refine the series by adding more terms to obtain a composition series, which 
is just such a series in which no more terms can be inserted (since each term is 
a maximal normal subgroup of its predecessor). 

For example, S, has a composition series 


S4 > Ag > Vg = Cy x Cp > C2 > {1}, 


with composition factors $4/Aq & C2, Aa/V4 = C3, V1/C2 & Co, and C2/{1} = 
C2; in other words, C2 three times and C3 once. We have |$4| = 4! = 2°3. 

The Jordan—Holder Theorem asserts that, no matter what composition 
series we choose for G, we will obtain the same composition factors (each repeated 
the same number of times in both series). 


Theorem 7.10 (Jordan—Hélder Theorem) Let 


G2 Q5236 26S. Sei 


and 


G=Hj) >Hi > H2>...> H, = {1} 


be two composition series for the finite group G. Then the lists of composition 
factors obtained from the two series are the same. In particular, the series have 
the same length (that is, r = s). 
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Proof Our proof is by induction on the order of G. The induction begins with 
the trivial group, which has the empty list of composition factors. So we assume, 
inductively, that the theorem holds for all groups smaller than G. 

If G; = Hj, then deleting G from the given series gives two composition series 
for G,. By the induction hypothesis, they have the same lists of composition 
factors (and the same length): adding G/G gives the list of composition factors 
of G, and we are done. So we may suppose that G; # Hy. 

Let N = Gi H;. Then N is the intersection of two normal subgroups of G, 
and so is a normal subgroup. Also, GH; is a normal subgroup containing Gj, 
so G,H, = G, or G,H, = G (by the maximality of G; as normal subgroup). 
The first alternative is impossible, since it implies Hy; < G1, whence H; = G, 
(since H, is also maximal normal), contrary to assumption. So G,H, = G. 

From this it follows that 


G/G, = GH, /G, > A,/(G, M Hy) = H,/N, 


and similarly G/H, = G,/N. 
Now let N = No > Ni >... > N; = {1} be a composition series for N. Let 
L be the list of composition factors derived from this series. Adding G at the 
start of the series gives a composition series for G;. By the inductive hypothesis, 
LU{G,/N} is the list of composition factors for G, and is the same as obtained 
from the composition series G1 > Gp >... > {1}. Hence the list of composition 
factors of G obtained from the first composition series is £U {Gi /N,G/G}}. 
In the same way, the list obtained from the second series is LU{H,/N, G/H;}. 
But we have already showed that G/G, = H,/N and G/H, ~ G,/N. So the 
two lists are the same. 


7.5 Soluble groups. The Jordan—Holder Theorem suggests that finite sim- 
ple groups are the ‘building blocks’ of finite groups; any finite group is built 
from a unique collection of simple groups (its composition factors). There are 
two fundamentally different kinds of simple groups, and groups built entirely 
from the first type have very different properties from those with some factors 
of the second type. 

The first type of simple group consists of the cyclic groups of prime order. By 
Lagrange’s Theorem, these groups have no non-trivial subgroups at all, and so 
certainly they are simple. A simple abelian group is necessarily cyclic of prime 
order. 

The second type consists of the non-abelian simple groups, which we will 
discuss further in the next section. 

A finite group G is soluble if all its composition factors are cyclic of prime 
order. The strange name for this class of groups will remain mysterious until 
we discuss the work of Galois in Chapter 8, connecting these groups with poly- 
nomial equations which are ‘soluble by radicals’. In this section, we provide 
the groundwork for that theory, by giving some alternative characterisations of 
soluble groups. 
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First, some more definitions. If 2 and y are elements of the group G, their 
commutator, written [x,y], is the element «~ty~!wzy. (Note that x and y 
commute—that is, zy = yx—if and only if [a,y] = 1.) The commutator 
subgroup, or derived group, of G, is the subgroup generated by all com- 
mutators in G; it is written G’ or [G, G]. Finally, the derived series of G is the 
series 


GO > GO > Gg?) ae 


defined by G = G, G+) = [G, GO] for i > 0. 


Lemma 7.11 For any group G, the subgroup [G,G] is normal in G and 
G/[G, G] is abelian. Moreover, if N <G and G/N is abelian, then N > [G,G]. 


Proof Calculation shows that 


oy 


g(x, ulg = [g7*29, 97 *yg)- 


Hence conjugation by g merely permutes the commutators, and fixes the sub- 
group they generate. Hence [G, G] is normal in G. Let H = [G,G]. Then, for any 
x,y € G, we have 


[cH, yA] = [v,y]H = A, 


since |x, y] € H; and thus G/H is abelian. This argument shows further that G/N 
is abelian if N contains H. Conversely, suppose that G/N is abelian. Then, for 
all x,y, 


N = [«N,yN] = [z,y]N, 


so [x,y] € N. Since this holds for all « and y, we have [G,G] < N. 


Theorem 7.12 The following conditions for the finite group G are equivalent: 


(a) there is a series 


G65 Gi SSG Sh 


of subgroups of G with G; 3 Gi-1 and Gj_-1/G; cyclic of prime order for 
$= Ty ey 03 


(b) there is a series 


G=H)>H,>H2>...>H, = {1} 


of subgroups of G with H;  Hj_-1 and H;-1/H; abelian for i =1,...,8; 
(c) G® = {1} for some d. 
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Proof Clearly (a) implies (b). Moreover, if (b) holds, then we can refine the 
given series to a composition series; the composition factors are abelian simple 
groups, and hence are cyclic of prime order, so (a) holds. 

Suppose that (b) holds. Since Ho/H; is abelian, the lemma implies that 
GY = [G,G] < Hj. This is the first stage of a proof by induction that G < H; 
for all 7. Suppose that this holds for 1 = k. Then all commutators [g, h], for 
gh € GM), lie in Hi; so G@+) < Hi. Also, HL < Hyii, since H,/Hy41 is 
abelian. So the equation G® < H; holds also for i = k +1. By induction, it 
holds for all i; and so G“) < H, = {1}. 

Conversely, if G(® = 1, then the normal series 


G= GO eG ei. SEO =f} 


has abelian factors, so (b) holds. 


Corollary 7.13 A group of prime power order is soluble. 


Proof Corollary 7.9 and Theorem 7.12. 


Condition (a) is our definition of solubility. So the theorem gives three 
equivalent characterisations of finite soluble groups. 

For infinite groups, conditions (b) and (c) are equivalent (check that the 
proof given above is valid), but not equivalent to (a) (which only holds for finite 
groups!). So, in general, we take (b) or (c) as the definition of solubility. 


7.6 Simple groups. [If all finite simple groups were cyclic of prime order, 
then all finite groups would be soluble. Unfortunately, this is not so. The 
alternating group As is simple and non-abelian. 

As mentioned earlier, the finite simple groups have been determined. This 
theorem is probably the most complex ever proved, and is well beyond the scope 
of this text. Even the description of the groups in the classification is more than 
Ican attempt here, except in broad outline. 

According to the classification, the finite non-abelian simple groups are of 
three types. First, there are the alternating groups A, for n > 5. (We will prove 
their simplicity below.) 

Then there are the so-called ‘groups of Lie type’, which are defined as cer- 
tain groups of matrices over finite fields. The easiest type to describe are the 
projective special linear groups. The special linear group SL(n,q) con- 
sists of all n x n matrices over the finite field GF(q) which have determinant 1; 
the group operation is matrix multiplication. The subgroup Z of this group con- 
sisting of all scalar matrices with determinant 1 (that is, all cl,,, where c € GF(q) 
and c” = 1), is normal; we set PSL(n,q) = SL(n,q)/Z. Now it can be shown 
that PSL(n, q) is simple for all n > 2 and all prime powers q, with the exception 
of PSL(2,2) and PSL(2, 3). 

Finally, there are just 26 so-called ‘sporadic groups’, which have no uniform 
definition and have to be constructed individually. The smallest has order 7920; 
the largest, approximately 10°4. 
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Theorem 7.14 The alternating group Ay is simple for alln > 5. 


Proof We use a similar method to the one we employed to find the normal 
subgroups of $5. We recall from the proof of Proposition 3.30 that the conjugacy 
classes in Ss have sizes 1, 24, 15, 20, 10, 30, and 20, where the first four consist 
of even permutations and make up As. Obviously, a conjugacy class in $5 will 
be a union of conjugacy classes in As: we have to see how they split up. 


Lemma 7.15 Let C be a conjugacy class in S,, which is contained in Ay. 
Then either C is a conjugacy class in An, or C is the union of two conjugacy 
classes C’ and C” of the same size in Ay. The first alternative holds if and only 
if some member of C commutes with an odd permutation. 


Proof Since S$; acts transitively on C’ by conjugation, there is a bijection 
between elements of C’ and cosets of the stabiliser of an element c € C’. Now, in 
this case, the stabiliser of c is 


{ge Sy: g ‘cg =c}={g € 5, : cg = gc} = Cs, (c), 


the centraliser of c. 

If no odd permutation commutes with c, then H = Cy, (c) < An, and so half 
of the cosets of H lie in A, and the other half in S$, \ An. So the orbit of A, 
containing c contains just half of the conjugacy class C’, and C’ splits into two 
classes of equal size. 

On the other hand, suppose that H contains an odd permutation. Then 
Cy, (c) = Cg,(c) M Ay is a subgroup of index 2 in H. Now the size of the 
Ay-conjugacy class is the index of the stabiliser, which is 


|A, : Ca, (c)| = |Sn : Ca,,(©)|/2 = |Sn : Cs, (©)| = CI. 


So C is a conjugacy class in Ay. 


Now the conjugacy class of size 15 consists of elements with cycle struc- 
ture (2,2,1) (products of two transpositions). Each of these commutes with a 
transposition, so the class does not split in A,. The class of size 20 consists of 
3-cycles; the 3-cycle (1 2 3) commutes with the transposition (4 5), and again the 
class does not split. The class of size 24 consists of 5-cycles. It can be shown that 
a 5-cycle commutes only with its own powers, all of which are even permutations. 
So this class splits into two classes of size 12 in As. 

We conclude that the conjugacy classes in As have sizes 1, 15, 20, 12, and 
12. No sum of any proper sublist of these, including 1, is a divisor of 60. So 
As has no normal subgroups except itself and the identity, and therefore it is 
simple. 

Now we show by induction that A, is simple for n > 5. The inductive hypoth- 
esis is that A,_1 is simple. So let N be a non-trivial normal subgroup of Ay. 
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Then NA,_1 is a subgroup of A, containing A,_1. If NAn—1; = An—1, then 
N < An_1; since N is normal, it is contained in all of the point stabilisers 
(the conjugates of A,_1), and so N = 1, contrary to assumption. Now A, _; is 
a maximal subgroup of A, so it follows that NA, 1 = An. 

The subgroup 4,-1.9 N is normal in A,_,. By the inductive hypothesis, 
An-1 0 N = An-1 or An-1 9 N = {1}. In the first case, N > A,-1, and the 
equation NA,—1 = A, implies N = A,,. In the second case, we have |N| = n, 
and so A, has a conjugacy class of size at most n — 1. By the lemma, S,, has a 
conjugacy class of size at most 2(n — 1), which can be seen to be impossible for 
n > 6 (Exercise 7.13). So this case cannot occur, and we are done. 


7.7 Extensions. The cyclic group Cy, and the Klein group V4 both have 
composition factors C2 (twice). So the composition factors alone are not enough 
to determine a group. If simple groups are the building bricks of finite groups, 
then extension theory is the mortar used to stick them together. 

The general problem of extension theory is as follows: given groups A and B, 
describe all possible groups G which have a normal subgroup N such that N is 
isomorphic to A while G/N is isomorphic to B. (Such groups are called exten- 
sions of A by B.) It is clear that a complete solution to this problem, together 
with the list of finite simple groups, would allow us to describe finite groups 
completely. Such a complete solution does not exist. 

In this section, we examine an important special case of the extension 
problem, when the group A is abelian. We will see that two pieces of informa- 
tion are needed to define an extension of A by B: first, an action of B on A; and 
then a factor set, a certain function from B x B to A. Unfortunately, different 
actions and factor sets may define isomorphic groups, so we do not get informa- 
tion as precise as we would like; but we will prove some results based on this 
approach. 

Let A be an abelian normal subgroup of the group G, with G/A & B. For 
any g € G, the map o, : A> A defined by ao, = g~'aq is well defined (by the 
normality of A), and is an automorphism of A. Now the map @: G — Aut(A) 
defined by g@ = o, is a homomorphism, since it is easily checked that o,0, = 
Tgh- Now A is abelian, so a, = 1 for a € A. Thus A lies in the kernel of the 
homomorphism @. It follows that the value of g@ is the same for all elements 
of the coset Ag. So @ induces a homomorphism ¢ : G/A = B — Aut(A). This 
homomorphism is the action of B on A. 

There is a special kind of extension of A by B known as a semidirect 
product or split extension. This is an extension G containing a complement to 
A, a subgroup H which satisfies AH = G, AN H = {1}. Now a semidirect prod- 
uct is determined up to isomorphism by the action. For we have H & G/A = B, 
and every element of G is uniquely expressible in the form ha for h € H, a€ A. 
We have 


(hia1)(h2a2) = (hihg)(hy ‘aihe)a2 = (hih2)(a1(h2d)az). 
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So, if we identify H with B (to which it is isomorphic), and define an operation 
oon Bx A by the rule 


(b1, a1) © (bz, a2) = (b1b2, a1 (b2¢) a2), 


the result is a group isomorphic to G. 

The semidirect product of A by B using the action homomorphism ¢: B > 
Aut(A) is written as A = B, or (if we want to stress the action) A x¥ B. 

However, not every extension is split. For example, C4 and C2 x C2 are both 
extensions of C2 by C2 with trivial action (since both are abelian); the second, but 
not the first, is a semidirect product. So we must look further. The complement 
HT is a special kind of set of coset representatives for A in G. So we choose an 
arbitrary set of coset representatives, and describe the extension. 

Let S be a set of coset representatives. We will always assume that we have 
chosen the identity as the representative of the coset A. We also use the notation 
9 for the representative of the coset containing g. Note that the set of cosets (and 
hence 5’) is in one-to-one correspondence with B. We write the representative of 
the coset corresponding to b € B as s(b). 

Now any element of G has a unique representation of the form s(b)a, for 
b€ B,a€ A. Also, s(b), acting by conjugation on A, induces the automorphism 
bod. Now 


(by )a18(b2)az = (s(b1)s(b2))(a1(b2¢)az), 


and the only difference is that now we do not have s(b)s(b2) = s(b,b2). However, 
it is true that s(b1)s(b2) lies in the coset corresponding to b1b2; so we can write 


8(by)s(bz) = 8(b1b2) f (b1, ba), 


where f is a function from B x B to A (that is, a function of two variables in B 
taking values in A). 

The function f is called a factor set. 

Now it is clear that, if we know the action ¢ and the factor set f, then the 
group is uniquely determined; taking its elements to be B x A as before, the 
group operation o is given by 


(bi, 1) © (ba, @2) = (bide, f(b1, bz) a1 (b2¢)az2). 


Note that, if the factor set is trivial (that is, it always takes the value 1), we have 
the semidirect product. 

At this point, for clarity, we change notation. We write the abelian group A 
additively (so that 0 is the identity, and —a the inverse of a), and we write a? 
instead of a(b¢). 
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Theorem 7.16 The function f: Bx B— A is a factor set if and only if it 
satisfies 


(a) f(1,b) = f(b, 1) = 0; 
(b) f(b1,b2b3) + f (be, 63) = f(bib2, bs) + f(b1, be)”. 


Proof (a) follows from our choice of the identity as coset representative for the 
coset A. (b) is obtained by (carefully) evaluating ((b1, 1) 0 (b2,1)) o (b3,1) and 
(b1, 1) 0 ((b2, 1) o (b3, 1)) and equating the results. 

Conversely, if (a) and (b) hold, then the operation o defined above makes the 
set A x B into a group which is an extension of A by B. 


Our representation of the extension by a factor set depends on the choice 
of the coset representatives. How does the factor set change if we use different 
representatives? Suppose that the coset representatives s(b) and s’(b) give factor 
sets f and f’ respectively. We have s’(b) = s(b)d(b), where d is a function from 
B to A satisfying d(1) = 0. Then we have 


8(b1)d(b1) (b2) (be) = s(b1b2)d(b1b2) f’ (b1, bg). 
After some calculation, and writing the result in additive notation, we obtain 
f"(b1,b2) — f (br, be) = d(b1)” + d(b2) — d(b1b2). 


We call the factor sets f and f’ equivalent if this condition holds. Thus, equiv- 
alent factor sets arise from the same group with (possibly) different choices of 
coset representatives. 

It turns out that there is a convenient algebraic description of factor sets. 
Since they are functions, we can add them pointwise: 


(f1 © fo)(b1, b2) = fr(b1, b2) + fo(b1, b2). 


The sum of factor sets is again a factor set (which can be checked using 
Theorem 7.16), and indeed the factor sets form a group F with this operation. 
The factor sets equivalent to the zero element (those of the form f(b,,b2) = 
d(b,)°2 +d(bz) — d(bb2) for some function d) are called inner factor sets. They 
form a subgroup Z of F. Now we define the extension group E(B, A) of A by B 
(with the given action) to be the group F/T. Thus, elements of E(B, A) describe 
extensions, and the zero element describes the split extension. In particular, 


Every extension of A by B splits if and only if E(B, A) = {0}. 
From this fact, we can obtain an important theorem on extensions: 


Theorem 7.17 (Schur’s Theorem) Suppose that the abelian group A and 
the group B have coprime orders. Then any extension of A by B splits. 
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Proof Let |A| = m and |B| = n, with (m,n) = 1; suppose that pm + qn = 1 
for some integers p and q. Let f € F be a factor set. Since the values of f lie 
in a group of order m, Lagrange’s Theorem shows that mf = 0 € Z. We show 
that also nf € Z. It follows that f = (pm + qn)f € TZ, so that JT = F and 
E(B, A) = F/T = {0}. 

Define 


d(b) = SF f(a,0). 


xEB 
Now sum the equation 
f (bi, b2b3) + f(b2, 3) = f(bib2,b3) + f(b1, b2)”® 


(equation (b) of Theorem 7.16) over 6; € B, using the fact that b1b2 runs over 
B as b, does: we obtain 


d(bab3) + nf (bz, b3) = d(b3) + d(b2)”®, 


so that nf (bz, b3) = d(be)’s + d(b3) — d(bzb3) is an inner factor set, as required. 


Remark The theorem is true without the restriction that A is abelian. If either 
A or B is soluble, then an inductive argument can be used to reduce the problem 
to the case handled by Schur. This was done by Zassenhaus, and the result is 
referred to as the Schur—Zassenhaus Theorem. In general, if the orders of A 
and B are coprime, then at least one of them must be odd, and the celebrated 
Feit-Thompson Theorem asserts that a group of odd order is soluble. But 
the proof of this theorem is several hundred pages long! 


7.8 A glimpse at homological algebra. The arguments used above give 
a glimpse of an important area of algebra on which we have not yet touched, 
known as homological algebra. The calculations that we made with factor sets 
are not as ad hoc as they appear, but are part of a much larger scheme. 

Let R bearing, and M aright R-module. As part of the programme of study- 
ing R via its modules, one can define a sequence of abelian groups H”(R, M) for 
n > 0, called the cohomology groups of R with coefficients in M. This is in 
part inspired by algebraic topology, where (abelian) groups are used as invariants 
of topological spaces. 

We cannot here even give the general definition, much less study the impor- 
tant properties of the cohomology groups. It will suffice to say how they generalise 
the extension group. 

Let G be a group, and R aring. We define the group ring to be the set of all 
finite sums )> rj;g;, where r; € Rand g; € G. Addition is defined coordinatewise: 
YS rigst dS sig: = DO (ri + 5i)9;. Multiplication is defined by extending the group 
operation linearly. This multiplication is often called convolution: it is given by 


the rule 
(Xa): (Cas) = (Cea) 
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where 


t; = S- TGSk- 


GGIK=Gi 


Now, if A is an abelian group on which G acts, then A becomes a right RG- 
module, by extending the action of G on A linearly. 
Now the following is true: 


The extension group E(B, A) is equal to the second cohomology group 
H?(ZB, A). 


So the second cohomology group tells us about the splitting of extensions. 

The first cohomology group also has a group-theoretic significance. In the 
split extension G of A by B (with the given action), there are usually several 
complements of A (subgroups H which satisfy AH = G,AM H = {1}). The 
number of conjugacy classes of such complements is equal to the order of the 
first cohomology group H1(ZG, A). Indeed, there is a natural regular action of 
H'(ZG, A) on the set of conjugacy classes. 

For higher cohomology groups, the interpretations in terms of extensions are 
less transparent. 


Exercise 7.1 Show that (GA3) is a consequence of (GA1) and (GA2). 


Exercise 7.2 Show that the proof of Cayley’s Theorem involves considering the action 
of G by right multiplication on the set of right cosets of the trivial subgroup {1}. 


Exercise 7.3 Let H and K be subgroups of the group G. 


(a) Show that |HK| =|A|-|K|/|HN KI. 
(b) Show that HK is a subgroup if and only if HK = KH. 
(c) Suppose that h~'Kh = K for all h € H. Show that HK is a subgroup. 


Exercise 7.4 For each of the five regular solids, find a formula for the number of 
colourings of the faces with r colours, two colourings differing by a rotation being 
regarded as identical. 


Exercise 7.5 What happens if we apply the Orbit-Counting Lemma to the action of 
G on itself by conjugation? 


Exercise 7.6 Let P be a p-group, and N a non-trivial normal subgroup of P. By 
considering the action of P on N by conjugation, prove that NM Z(P) #4 {1}. 


Exercise 7.7 (*) Let P be a Sylow p-subgroup of a finite group G, and let H = Ne(P). 
Prove that Ne(H) = H. 

[Hint: if g € Ne(H), then g~'Pg is a Sylow subgroup of H; but H has only one 
Sylow p-subgroup.] 
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Exercise 7.8 («) Let G be a finite group. Show that the following properties of G are 
equivalent: 


(a) G is the direct product of its Sylow subgroups; 
(b) if H is a proper subgroup of G,, then Ng(H) properly contains H. 


[Hint: Any p-group satisfies (b), and this property is preserved by direct products. 
For the converse, use the preceding question.] 


Remark Such a group is said to be nilpotent. 


Exercise 7.9 Deduce from Corollary 7.8 that, if p is prime, then any subgroup of 
p-power order of a finite group G is contained in a Sylow p-subgroup of G. 


Exercise 7.10 Let N be a normal subgroup of G, and P a Sylow p-subgroup of N. 
Show that G= Ne(P)N. 

[Hint: Take g € G. Then g~'Pg is a Sylow subgroup of N, and so g~'Pg = n7~'Pn 
for some n € N. Then gn~* =h € Na(P), and g = hn.] 
Remark The argument in this question is called the Frattini argument. 


Exercise 7.11 Prove the Jordan—Hélder Theorem for infinite groups having composi- 
tion series of finite length. 


Exercise 7.12 Show that a conjugacy class C in S,, splits into two classes in An if 
and only if the cycle lengths of its members consist of distinct odd numbers. 


Exercise 7.13 Show that, if an element c € S;, has cycle type which contains the 
number 7 with multiplicity ai, for 1 <n, then the order of the centraliser of c is 


n 
f= [[ote. 
41 


and hence that the conjugacy class of c has size n!/f. Deduce that the size of the 
conjugacy class is greater than 2(n — 1) for n > 6. 


Exercise 7.14 Show that the group PSL(2,q) has order q(q’ — 1) if q is a power of 2, 
and order q(q* — 1)/2 if q is an odd prime power. 


Exercise 7.15 The groups As, PSL(2, 4), and PSL(2, 5) are all simple groups of order 
60. Prove that they are all isomorphic. 


Exercise 7.16 Prove that there is a unique simple group of order 60 (up to 
isomorphism). 


Exercise 7.17 Prove that a semidirect product A x, B, where the action ¢ is trivial, 
is the direct product A x B. 


Exercise 7.18 Prove Theorem 7.16 
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Exercise 7.19 Prove that E(Cp, Cp) = Cp. [The split extension C, x Cp corresponds 
to the zero element of this group. The p — 1 non-zero elements are all associated with 
the non-split extension C’,2.] 


Exercise 7.20 (a) Prove that the sum of factor sets is a factor set. 
(b) Prove that an inner factor set is a factor set. 
(c) Prove that the sum of inner factor sets is an inner factor set. 


Further ring theory 


We have some unfinished business from Chapter 2: the proof that any principal 
ideal domain (PID) is a unique factorisation domain (UFD), and the proof of 
Gauss’ Lemma. The first of these introduces a connection between factorisation 
and chain conditions in a ring, and leads us to the Hilbert Basis Theorem. 


7.9 PIDimplies UFD. In Chapter 2, we gave part of the proof that a prin- 
cipal ideal domain is a unique factorisation domain: we showed the uniqueness, 
but not the existence, of factorisations into irreducibles. Here is the remainder 
of the proof. It depends on showing that a PID satisfies a condition which is 
very important in more advanced ring theory, the ascending chain condition 
(ACC) on ideals. 


Proposition 7.18 Let R be a PID and let I, Io,... be ideals in R forming an 
ascending chain: 


I — Ip Ce ee oy 
Then the number of ideals in the chain is finite. 


Proof Suppose that we have an infinite ascending chain of ideals; say, I, C 
Inc.... Let [=U J, be the union of all the ideals in the chain. We apply the 
Ideal Test to I: 


(a) Take x € I,r € R. Now I = UI, so x € I, for some n. Since J, is an 
ideal, we have rz, zr € In, so ra,ar € I. 


(b) Take x,y € I. Then x € I, y € Im for some m. Without loss of generality, 
n >m; then I, C In, and so y € I,. Now x — y € In, since I, is an ideal; so 
a—yel. 


Thus J is indeed an ideal. 

Now R is a PID, so I = (a) for some element a € I. Now I = U,, In, 
and so a € I, for some n; so all multiples of a lie in [,, whence J = I,. But 
In C Inga C I, and we have a contradiction. 

So an ascending chain of ideals must be finite. 


Now we prove the theorem. Let R be a PID, and suppose ag is an element 
of R which is not zero or a unit and cannot be factorised into irreducibles. In 
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particular, ao itself is not irreducible (or we would have a factorisation with only 
one term); say @9 = ab), where neither a, nor 0; is zero or a unit. It cannot be 
that both a, and 6; have factorisations, or else we would obtain a factorisation 
of ag by combining them. Suppose, without loss, that a; has no factorisation. 
Then a; = agb2, where we may suppose that a2 has no factorisation; and so we 
may continue with a, = @n41bn+41 for all n. 

Let I, = (an). Since dy = An41bn41 and by+1 is not a unit, we have I, C In4i; 
so we have an infinite ascending chain of ideals, contrary to Proposition 7.18. 

This establishes that every element in R (other than zero and units) has a 
factorisation into irreducibles. We have already proved that the factorisation is 
unique. So this completes the proof of Theorem 2.21. 


7.10 Noetherian rings. The crucial step in the proof in the last section is 
that, ina PID, there cannot be an infinite strictly ascending chain of ideals. This 
followed from the fact that each ideal is generated by a single element. With a 
little more effort, we can give a necessary and sufficient condition. 


Theorem 7.19 The following conditions on a ring R are equivalent: 


(a) there is no infinite strictly ascending chain of ideals of R; 
(b) every ideal of R is generated by a finite number of elements. 


Proof The proof that (b) implies (a) follows closely the argument in the last 
section. Suppose that we have an ascending chain of ideals, say 


by GZ TocG ages 


Then the union of this chain is an ideal J. Since we are assuming (b), the ideal I 
is finitely generated; say, I = (r1,1T2,..-, 1m). Now each of r1,...,1%m belongs to 
some ideal in the chain; say, r; € In, for 7 =1,...,m. If m denotes the greatest 
of N1,...,%m, then all of Ip,,...,In,, are contained in [,,, by the fact that the 
ideals form an ascending chain. Hence I = (ri,...,1%m) © In. But obviously 
I, © I, since I, is a member of a chain whose union is J. We conclude that 
I =T,, and the chain has at most n distinct terms in it. 

Now we prove that (a) implies (b). Suppose that (a) holds for the ring R. 
Suppose, for a contradiction, that J is an ideal which is not finitely generated. 
Choose 7; € I. Then (r;) C J; and the inclusion is strict, since otherwise I 
would be generated by just one element. Hence we can choose rg € I \(r1). Then 
(r1,72) C I, and again the inclusion is strict. 

Continuing in this way, we choose elements r1,7r2,... € J such that each one 
is outside the ideal generated by its predecessors. Thus, if I, = (71,...,7n), the 
chain 


yr CIpc... 


is a strictly increasing chain of ideals, contrary to assumption. 
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In this proof, we made use of an innocent-looking principle: we showed that, 
for each n, the set I \ I, is non-empty, and proceeded to choose 7,41 from this 
set. This is justified by the Axiom of Choice, which we discussed at the end 
of the last chapter. 

A ring R is called Noetherian, or is said to satisfy the ascending chain 
condition on ideals (or the ACC), if condition (a) of the above theorem holds. 
Thus, in particular, any principal ideal domain is Noetherian. 

The last part of the proof in the preceding section shows the following: 


Proposition 7.20 Let R be a Noetherian integral domain. Then every element 
of R which is not zero or a unit can be factorised into irreducibles. 


Here is an example of a non-Noetherian integral domain, in which factori- 
sations into irreducibles do not necessarily exist. Let F' be a field, and let R 
be the ring of all expressions which are finite sums of terms of the form az, 
where a € F, q is a non-negative rational number, and « is an indeterminate. 
Addition and multiplication are defined in the ‘obvious’ way. (If we had said 
‘non-negative integer’ rather than ‘non-negative rational number’, this would be 
just the polynomial ring F'[a].) Now «x cannot be factorised into irreducibles: for 


1/2. 1/2 1/2. 1/4 


and none of the factors is a unit. From this, it is possible to extract an ideal 
which is not finitely generated, or an infinite ascending chain of ideals. 

On the other hand, the following theorem ensures a supply of Noetherian 
rings. 


Theorem 7.21 (Hilbert Basis Theorem) Let R be a commutative 
Noetherian ring with identity. Then R[x] is Noetherian. 


Proof Let J be an ideal in R[x]. Then it is not hard to show that the leading 
coefficients of polynomials of degree n in J form an ideal [,, of R. Moreover, 
In © In+1: for, if f € J has degree n, then af € J has degree n+ 1 and has the 
same leading coefficient as f. 

Since R is Noetherian, the ascending chain Jp C J; C ... is finite. Say that 
Ip = Im for all m > n. Moreover, I, is finitely generated. Let fi,...,f,% be 
polynomials of degree n whose leading coefficients generate In. 

Let g € J be a polynomial of degree m > n. Then, since I, = I, there are 
elements r1,...,7, € Rsuch that rj f,+---+rzf, has the same leading coefficient 
as g. In other words, g—2—"(ri fi +---+1rzfx%) belongs to J and has degree less 
than m. By induction, every element of J is the sum of a polynomial with degree 
less than n and one of the form fihi +---+ frhe, for some hi,...,hy € Ria}. 

Similarly, for each p < n there is a finite set S, of polynomials in J which 
have degree p and whose leading coefficients generate [,,. Arguing as above, any 
polynomial in J with degree less than n is a linear combination of SgU---USp_1 
(with coefficients in R). 

So J is finitely generated. 
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The dual condition to ACC is also important. A commutative ring FR is said 
to satisfy the descending chain condition or DCC, or to be Artinian, if 
every strictly descending chain I; D Ig > ... of ideals is finite. This condition 
turns out to be very strong: 


Theorem 7.22 (Hopkins’ Theorem) [f a commutative ring with identity is 
Artinian, then it is Noetherian, and has the property that there is a finite upper 
bound on the length of any chain of ideals. 


The converse is false: the ring Z is Noetherian but not Artinian. 
A similar result holds for non-commutative rings. We do not discuss this here: 
see the books by McCoy or Cohn listed under Further Reading. 


7.11 Gauss’ Lemma. Let R be a UFD. We want to show that the 
polynomial ring R[z] is a UFD. 

Take a polynomial f(x) = a,2"+--+-+a1¢%+a9 € R[z]. We define the content 
of f, written C(f), to be the greatest common divisor (g.c.d.) of the coefficients 
Gn,--+,@1,@9. (Remember that greatest common divisors exist in a UFD, and 
are determined uniquely up to associates.) We say that f is primitive if its 
content is associate to 1 (that is, C(f) is a unit). Then any polynomial f can be 
written f = C(f)- fi, where f; is primitive. 


Proposition 7.23 If f and g are primitive, then fg is primitive. 


(This is the crucial step in the proof; it is sometimes called Gauss’ Lemma, 
rather than the theorem that R[x] is a UFD.) 


Proof Suppose that fg is not primitive; let p be an irreducible which divides 
its content C( fg). Let 


f = Gna" +--+ +412 + ao, 
g = bya” +--+ + dbyx + bo. 


Now f and g are primitive, so p cannot divide all the coefficients of either 
polynomial. Choose r and s such that: 


pla; for i<r but p fa,; 
p|b; for j<s but pfbg. 


Consider the coefficient of x"** in fg. This is given by 
Certs = + Ar—1b541 or arbs or Ar+410s—1 Sprig acs 
Now p divides a; for i < r, so p divides all the terms before a,b, in the sum. 
Similarly, p divides b; for 7 < s, so p divides all terms after a;-b;. But p does 


not divide a,b,, by assumption. (In a UFD, if an irreducible p divides ab, then 
p divides a or p divides b.) So p does not divide the coefficient c,+;. But this 
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contradicts the assumption that p divides the content of fg. The contradiction 
shows that no such p can exist, so fg is primitive. 


It follows that, for any two polynomials f and g, we have C(fg) = C(f)C(g). 
For, if we write f = C(f) fi and g = C(g)gi, where f; and gi are primitive, then 
fg = C(f)C(g)figi, and figi is primitive, so the content of fg is C(f)C(g). 

Now the work of factorising a polynomial in R[x] can be divided into two 
parts: factorise its content; and factorise a primitive polynomial. The content 
can be factorised uniquely, since R is a UFD. For primitive polynomials, we 
need to consider the field of fractions F of R. 


Proposition 7.24 Let R be a UFD with field of fractions F. 


(a) A primitive polynomial in R[x] is irreducible in Rix] if and only if it is 
irreducible in Fz]. 

(b) If f and g are primitive polynomials in R[x] and f divides g in Fla], then 
f divides g in R{z]. 


Proof (a) If f factorises in R[x], then it factorises in F'[a]. Conversely, suppose 
that f = gh in F(a]. The coefficients in g and h are fractions of elements of R; let 
a and 0 be the least common multiples of their denominators. Then ag, bh € Riz], 
and abf = (ag)(bh). Hence 


ab = C(abf) = C(ag)C(bh). 


Now we can write ag = C(ag)g1, bh = C(bh)hi, where gi and h, are primitive. 
Then 


abf = C(ag)C(bh)gih1. 


From the two displayed equations and the fact that g,h1 is primitive, we see that 
f = 91h, a factorisation of f in Riz]. 

(b) Suppose that f = gh with h € Fla]. As above, let b be the least common 
multiple of the denominators of the coefficients in h. Then bf = g.bh in R{z], so 


b = C(bh), 
bf = gC(bh)hi, 


where hj is primitive. So f = gh, as required. 


Thus, factorisation of primitive polynomials in R[x] exactly mirrors their 
factorisation in F[a], so is unique up to order and associates. 


7.12 Eisenstein’s criterion. Gauss’ Lemma is very useful for the practical 
business of factorising polynomials. As an application, here is a simple proof, 
using Gauss’ Lemma, of the theorem of Pythagoras. 
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Theorem 7.25 \/2 is irrational. 


Proof It is enough to show that the polynomial x? — 2 is irreducible over Q, 
since if g = V2 were rational, then x—q would be a factor of x? —2. Now x? —2 is 
a primitive polynomial over Z, so, if it factorised in Q/z], then it would factorise 
in Z[a], by Gauss’ Lemma. But, if az +6 divides x? —2 in Z[{z], then a divides 1, 
and b divides —2; so the only possible factors are x +1, x—1, x +2, or x —2. 
None of these is a factor, since none of +1,+2 is equal to V2. 


What Gauss’ Lemma is telling us here is that, if the monic integer polynomial 
x? — 2 has a rational root, then this root is an integer. 

In view of the importance of Gauss’ Lemma, it is interesting that it has a 
very much simpler proof for integer polynomials, or indeed for polynomials over 
any PID. 


Proposition 7.26 Let R be a principal ideal domain, and let f and g be 
primitive polynomials in Ria]. Then fg is primitive. 


Proof Suppose not, so that there is an irreducible element p € R which divides 
every coefficient of fg. Now F = R/(p) is a field. By considering the coefficients 
of a polynomial f € R{[z] mod p, we obtain a polynomial f € F[z]. 

Now f and J are non-zero, since p does not divide all the coefficients of f 
or g (these polynomials are primitive); but f -g = fg =0, by assumption. This 
contradicts the fact that F'[a] is an integral domain. 


The trick in the proof could be stated like this. The natural homomorphism 
from R to R/(p) = F extends to a homomorphism from R[a] to F[z]. 
This trick has other uses too. 


Theorem 7.27 (Eisenstein’s criterion) Let R be a principal ideal domain, 
and p an irreducible in R. Let 


f(z) = Qnz” +--+ +a1¢ + a9 
be a primitive polynomial in Ria] with the following properties: 


(a) p does not divide the leading coefficient ay; 
(b) p divides the other coefficients an—1,..., 41,40; 
(c) p® does not divide the constant term ao. 


Then f is irreducible. 


Proof Suppose that f = gh. By reducing mod p, we have f =gh. By assump- 
tions (a) and (b), we have f = @,x” (all the other terms are equal to zero mod p). 
Now the ways in which a power of x can factorise are very limited: we must have 
9 = bmz™ and h = Ch-mx”" ™, say, with 0 < m <n. Thus we have 

g(t) = bma™ + +--+ bx + bo, 


h(x) =Cpemt” ™ +e +ea+ Co; 
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where all the coefficients except the leading ones are divisible by p. 
Now consider the constant term. We have ag = boco, and p divides both bp and 
co; so p* divides ag, contradicting assumption (c). The theorem is proved. 


Example 1 Taking p = 2, we see that x? — 2 is irreducible over Z (and hence 
over Q, by Gauss’ Lemma). 

More generally, if p is prime and n > 1, then x” — p is irreducible, so the nth 
root of p is irrational. 


Example 2 Sometimes Eisenstein’s criterion does not apply to the given 
polynomial, but can be made to do so by transforming it. 
Consider the polynomial 


f(a) =aP 14 oP 724... + a4, 


where p is prime, whose roots are the complex pth roots of unity. Eisenstein’s 
criterion does not apply, since 1 has no prime factors. Instead, we consider g(a) = 
f(a +1). We have f(x) = (2? —1)/(a — 1), so g(x) = ((a@ +1)? — 1)/z. 


(a) The coefficient of x?~+ in g(x) is 1. 

(b) For 1 < i < p—2, the coefficient of x’ in g(x) is the coefficient of x**+ in 
(a +1)?, namely (aaa which is divisible by p. 

(c) The constant term in g(x) is equal to g(0) = f(1) = p, which is divisible by 
p but not by p?. 
By Eisenstein’s criterion, g (and hence also f) is irreducible. 


Example 3 The polynomial x? + y? — 1 in Q[z, y] is irreducible. For we regard 
this expression as a polynomial in R[z], where R = Qly]: it has the form x? + 
Ox + (y+1)(y—1). Now Eisenstein’s criterion applies, where we take p to be the 
irreducible y — 1 in R. 

This argument works over any field in which y+ 1 4 y — 1; that is, any field 
of characteristic different from 2. It fails for a field of characteristic 2, and in this 
case we have x? + y* —1=(a#+y-1)?. 


7.13 <A glimpse at algebraic geometry. Algebraic geometry represents 
the flowering of the seed planted by Descartes when he turned geometry into 
algebra. It is a central subject of modern mathematics, and we can get no more 
than a brief glimpse here. Fortunately, good introductory books are available. 

An algebraic curve in the plane is a set of points whose coordinates satisfy 
some polynomial equation. Examples such as a circle x? + y? = 1 and a parabola 
y = x? are familiar, but many more complicated examples have been studied. 

When we look at higher dimensions, however, we see that the definition 
must be broadened. For example, in three dimensions, a polynomial equation 
defines a surface, and a curve may be defined as the intersection of two surfaces. 
Such a curve cannot be defined by a single equation. For example, the cylinders 
x? +y? =1 and 2? + z? = 4 in 3-dimensional Euclidean space meet in a pair of 
non-plane curves. 
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The approach we take is to consider all polynomials which vanish on the set 
of points in question. Accordingly, given a set S' of polynomials in n variables 
over a field F', the algebraic set A(S) defined by S is the set of all points in 
F” on which all the polynomials vanish. (So the curves referred to above form 
A(a? + y? —1,a”? +2? —4).) Conversely, for any subset X of F'”, we let I(X) be 
the set of all polynomials vanishing on X. 

We see immediately that [(X) is an ideal. By the Hilbert Basis Theorem 
(Theorem 7.21), we know that I(X) is generated by a finite number of elements. 
Moreover, A((f,g)) = A((f)) NM A((g)). So any algebraic set is the intersection of 
a finite number of algebraic sets defined by single polynomial equations. 

The ideal [(X) has another important property: if f” € I(X) for some 
positive integer n, then f € I(X). An ideal with this property is called a radical 
ideal. 

We might hope that there is an exact correspondence between algebraic sets 
and radical ideals. In general, this is not so. For example, over the real numbers, 
the equations x?+y? = 0 and 2?+y* = 0 both define the algebraic set consisting 
only of the origin. Of course, over the complex numbers, they define larger and 
quite different algebraic sets. So we should work over an algebraically closed 
field in order to obtain the nicest properties. Now Hilbert’s Nullstellensatz 
(‘Theorem on zeros’) states the following: 


Theorem 7.28 (Nullstellensatz) Let F be an algebraically closed field. Let 
f,915---19m be polynomials in n variables over F, and suppose that, for any 
xref”, 


gi(@) =... = Jm(a) = 0 f(x) =0. 


Then, for some positive integer k, we have f* € (g1,..-Gm)- 


Corollary 7.29 Let F be algebraically closed. Then the maps I and A defined 
above are mutually inverse bijections between the algebraic sets in F” and the 
radical ideals in F[xy,...,%n]- 


Thus any problem about algebraic sets over an algebraically closed field F 
can be translated into a problem about ideals in the polynomial ring. We cannot 
pursue this correspondence much further; we end with a few observations. 

It is clear that the correspondence above interchanges inclusion: a larger 
algebraic set satisfies fewer equations. Adding ideals corresponds to intersecting 
algebraic sets, while multiplying ideals corresponds to taking the union of alge- 
briac sets. Also, since F'la1,..., 2] is a unique factorisation domain, an algebraic 
set defined by a single polynomial is the union of a finite number of algebraic 
sets defined by irreducible polynomials. 

The coordinate ring of an algebraic set A is the ring F[xj,...,¢n]/I(A). It 
consists of all the functions on A which are algebraic (induced by polynomial 
functions on F”). It is a commutative Noetherian ring, generated over F' by n 
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elements (the images of the coordinate functions 71,...,%,, under the canon- 
ical homomorphism). The next stage of development in algebraic geometry is 
to regard a set as a space in which to work, which is just as good as the orig- 
inal space F”: its coordinate ring replaces the polynomial ring, and a similar 
correspondence between radical ideals and algebraic sets can be described. Now 
the same algebraic set (as specified by its coordinate ring) may be represented in 
many different ways in F'” (perhaps for different n); but now we can ignore these 
differences in representation and correspond on the structure of the algebraic set. 

To end on a specific note, here is the algebraic proof of a fact which is intu- 
itively clear geometrically. We define a plane curve to be an algebraic set A(/), 
where f € F[x,yJ]; it is irreducible if f is irreducible. 


Theorem 7.30 (Bezout’s Theorem) Let A(f) and A(g) be plane curves 
over a field F' such that A(f) is irreducible and not contained in A(g). Then 
A(f) and A(g) intersect in only finitely many points. 


Proof At least one of the variables, say x, must occur in f. Let K be the field of 
fractions of Fy]. Then we can view f and g as elements in K [a]. Gauss’ Lemma 
shows that f is irreducible in K [a]. The hypothesis that A(f) is not contained 
in A(g) shows that f does not divide g. Hence f and g are coprime. Since K[z] 
is a PID, there exist polynomials p and q in K [a] such that pf + qg = 1. 

Now the coefficients of p and q are rational functions in y. So we can multiply 
up by the least common multiple of their denominators (say, h(y)) to obtain 


r(x, y) f(x,y) + s(x, y)g(x, y) = Aly), 


where r = hp and s = hq. 

Let (a,b) be a point lying on both curves. Then f(a,b) = g(a,b) = 0, and 
so h(b) = 0. This equation has only a finite number of solutions; and for each 
solution b, the equation f(z, b) has only a finite number of solutions in x. The 
theorem is proved. 


7.14 Local rings. In this section we consider only commutative rings with 
identity. 


Definition A local ring is a commutative ring with identity which has a 
unique maximal ideal. 


We shall give several constructions of local rings which are important in many 
parts of algebra: formal power series rings, p-adic integers, and localisations. We 
begin with a simple property of local rings. 


Proposition 7.31 Let I be a proper ideal of a commutative ring R with iden- 
tity. Then R is a local ring with maximal ideal I if and only if every element 
outside I is a unit. 


Proof Suppose that R is a local ring and J its maximal ideal. For a ¢ I, the 
ideal aR generated by a is not contained in J. By Krull’s Theorem, if it were 
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a proper ideal, it would be contained in a maximal ideal. So aR = R, whence 
there exists b € R with ab=1. 

Conversely, since no unit can lie in a proper ideal, it follows that if every 
element outside J is a unit, then every proper ideal is contained in J, so that I 
is the unique maximal ideal. 


Definition Let F bea field. The formal power series ring over F is the set 
F'|[a]] of all infinite sequences (an) = (ao, 41, 42,...) of elements of F; addition 
and multiplication are defined by 


e (an) + (bn) = (Cn), where cy, = Gy, + bi 


© (dn) « (bn) = (dn), where dn = S° aibn—i- 
i=0 


We usually write the sequence (ao, a1,...) as S- a,x”; then the addition and 
n>0 
multiplication appear natural. The set of elements }*a,2” of F'|[x]] with aj = 0 
is easily seen to be an ideal. So the fact that F'|[x]] is a local ring follows from 
the next result. 


Proposition 7.32 <A formal power series \\a,x" in F|[a]] is invertible if 
ao x 0. 


Proof An inverse }> b,x” for the given sequence should satisfy apbo = 1 and 
aobn + Q1bp—1 +++ + Gn—1b1 + Gnbo = 0 


for n > 0. The first equation gives by) = ag'. The second allows us to find the 
other coefficients by induction: if we know bo,...,b6n—1, then 


n 
oo 

Dn = a9 Aibn—j- 
i=l 


The element 5+ b,x” given by this induction is clearly the inverse of the given 
element. 


Remark If you have studied combinatorics, you may recognise the connection 
between recurrence relations and inverses of formal power series. 


Since the set J of formal power series with zero constant term is a maximal 
ideal, we see that F'[[s]|/J is a field; in fact, it is the field F. 

Next, let p be a prime number. The p-adic integers are ‘limits’ of consistent 
sequences of congruences modulo higher and higher powers of p. More precisely: 


Definition A p-adic integer is a sequence (a1, d2,...), where ay € Zpn, satisfy- 
ing the condition that a, and ay41 are congruent modulo p” for n = 1,2,.... (In 
other words, a,,+1 represents the same element of Zp» as a, does.) Addition and 
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multiplication are defined componentwise: (dy) + (bn) = (@n + bn), (Gn) - (bn) = 
(anb,). The ring of p-adic integers is denoted by Dp. 


Remark I have to make an apology here about conflicting notation. Many 
mathematicians use the symbol Z, for the ring of p-adic integers, but (following 
another very common convention) I have used this symbol for the ring of integers 
mod p. The letter 9 is Gothic (or Fraktur) capital O. It is difficult to write 
without practice; I recommend a capital O in handwriting in your notes. 


Proposition 7.33 For any prime number p, Op is a local ring. 


Proof First note that the identity element of 0, is the all-1 sequence (the 
constant sequence with value 1). 

We have to show that a sequence (a,,) satisfying the defining condition, with 
ag # 0, is invertible. 

By Euclid’s Algorithm, we can find bo such that agbo =p 1. 

Suppose that we have found };,...,bn such that ajb; =): 1 and bj41 =p: bi. 
Let bn41 = by, + xp’. Now 


An410n =p» Gnbn =p» 1, 
SO An+10n, =prti 1+ yp”. Then we find that 


An+1bn41 =pr+1 1+ yp" + xagp” + higher powers of p, 


so that if we choose x to satisfy y + xag =p 0, we have succeeded. Thus by 
induction we have constructed an inverse. 


We will generalise this construction in Exercises 7.29 and 7.30. 
The third construction is very general. 


Definition Let R be a commutative ring with identity. A non-empty subset 
S of R is called multiplicative if 0 ¢ S and, if a,b € S, then ab € S. If S is 
multiplicative, we define RS~! in a similar way to the construction of the field 
of fractions of an integral domain: the elements are equivalence classes [r,s] of 
ordered pairs (r,s) with r € R and s € S, where (rj, 51) is equivalent to (ra, s2) 
if ry 52 = r2s8,. Now define addition and multiplication of equivalence classes by 


e [71,81] + [re, 82] = [rise + 7281, $189], 
e [71,81] : [r2, 82] = [rir2, $189]. 


These operations are well defined and make RS~! into a ring. The elements [r, 1], 
for r € R, form a subring isomorphic to R, and the elements [s, 1], for s € S, are 
invertible. 


For example, if R is an integral domain, then the set of all non-zero elements 
of R is multiplicative, and the ring R(R \ {0})~' is the field of fractions of R. 
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Definition An ideal J of a commutative ring R with identity is said to be a 
prime ideal if, for any two elements a,b € R, ab € I implies a € I or bE I. 
Note that I is prime if and only if R/J is an integral domain. 

If I is a prime ideal, then S = R\J is multiplicative. We define the localisation 
of R at I to be the ring Rp = R(R\ I)". 


Now it turns out that R,; is a local ring. For let 
J={[r,s]:réI,s¢ TI}. 


Then J is an ideal of R;, and it is easily checked that every element not in J is 
invertible. 


Exercise 7.21 Eisenstein’s criterion is in fact valid for polynomials over a unique 
factorisation domain. Prove this. 


Exercise 7.22 In this exercise, we show that the following three assertions about an 
integer prime p are equivalent: 


(a) either p = 2 or p is congruent to 1 mod 4; 
b) —1 is congruent to a square mod p; 
(c) p is the sum of two squares of integers. 


—> 


Proof (a) implies (b): For p = 2 this is trivial. For p = 4m +1, use the fact 
that the multiplicative group of Z, is cyclic of order 4m (Proposition 7.45), and 
hence contains a cyclic subgroup of order 4. 

(b) implies (c): Suppose that p divides a? + 1. The ring Z[i] of Gaussian 
integers is Euclidean, and hence a PID. Let J be the ideal (a +i,p). Let J = 
(x + yi). Show that x? + y? = p. 

(c) implies (a): Any integer square is congruent to 0 or 1 mod 4. 


Exercise 7.23 Let R be a commutative ring, and J an ideal of R. The radical of J is 
defined to be the set of all r € R for which r” € I for some positive integer n. 


(a) Prove that the radical of an ideal is a radical ideal. 
(b) If g1,..-,9m € F[a1,...,2%n], where F is algebraically closed, prove that 
I(A(g1,---,9m)) is the radical of (g1,..., 9m). 


Exercise 7.24 Prove that the polynomials of degree at most n in F'[x, y] form a vector 
space of dimension (n+1)(n+2)/2. Can you find the analogous formula for polynomials 
in k variables? 


Exercise 7.25 Prove that F'[x]] (for a field F’) or 0, (for a prime number p) are 
integral domains. 


Exercise 7.26 Show that the inverse of 1 — 2 — 2? in Q|[z]] is > Fnx”, where Fy is 
the nth Fibonacci number (see Exercise 1.52). 
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Exercise 7.27 Show that Z is a subring of Dp. 


Exercise 7.28 Let p be an odd prime, and a an integer. Suppose that a has a square 
root in Zp. Prove that a has a square root in Op. 
(*) What happens if p = 2? 


Exercise 7.29 (x) This exercise and the next generalise the construction of the p-adic 
integers. 

Let Ri, Re,... be commutative rings with identity. Suppose that, for all m > n, we 
have a homomorphism O0m,n : Rm — Ry which is surjective. Suppose further that 


e Onn is the identity map for all n; 
e for all p > m= Nn, OpmOm,n = Op,n- 


Now let lim R, denote the set of all sequences (r1,7r2,...), where rn € Ry for all n 


and rmOm.n = Tn for all m > n. Define componentwise addition and multiplication on 
this set, and show that it is a ring. (It is called the inverse limit of the family of rings 
and homomorphisms. ) 


Exercise 7.30 (*) Let I) D Iz D ... be a descending chain of ideals in a commuta- 
tive ring R with identity, and suppose that ()In = {0}. (Such a sequence is called a 
filtration of R.) 

Show that the rings Rx = R/In and natural homomorphisms 0m,n : Rm > Rn 


satisfy the conditions of the preceding exercise. (The ring R = lim(R/In) is called the 
completion of R with respect to the filtration.) 

Show that R is embedded in R by the map r'> (In +r: n€N). 

Show that, if R = Z and I, = p”Z, where p is prime, then R = Op. 


Further field theory 


We saw in Section 2.16 that the standard way to construct a field is by adjoining a 
root of a polynomial to a smaller field. In this section, we examine the procedure 
more closely, and iterate it to adjoin all the roots of a polynomial. We apply 
this to prove a theorem of Galois on the existence and uniqueness of finite fields. 
Apart from its intrinsic interest, this material is crucial to the two applications of 
algebra we discuss in Chapter 8: Galois theory (on solving polynomial equations) 
and coding theory (on correcting errors in message transmission). 


7.15 Derivatives and repeated roots. Every student of calculus learns 
to differentiate polynomials. Using, for brevity, the notation Df for df/dz, we 
have: 
If f(a) = an” + Qp_1a”" 1 +---+ a4 + ag, 
then Df (x) = nanz”™ 1 + (n— lapiz”? +--+ + a4. 
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This statement makes sense for polynomials over any field (or indeed any ring), 
as long as we do not try to use arguments involving limits. (As usual, we take 
na to mean a+a+---+a (n terms) for positive integers n.) 

In the spirit of the subject, we give an axiomatic treatment. 


Theorem 7.34 For any field F, there is a unique F-linear map D : Fla] > 
Fz] satisfying the following two conditions: 


(Der1) D(fg) = f (Dg) + (Df) + 9; 
(Der2) Dz = 1. 


Proof We have 
D1 = D(1-1) =1-(D1)4+(P1)-1, 


so D1 = 0. By linearity, Dc = 0 for allc € F. Also, D(a?) = x-(Dx)+(Da)-x = 
2x; and an easy induction argument then shows that D(a”) = na"~! for all 
positive integers n. Hence, by linearity, D is given by the formula quoted earlier. 

It remains to show that the map defined this way is F-linear and satis- 
fies (Derl) and (Der2). The linearity and (Der2) are obvious. (Der1) follows by 
linearity if we can prove it in the case where f and g are powers of x: and this 
is done as follows: 


D(e™ 2.2") =D") = (m4nje™*?, 


a De"). 4 (De ls a Same ee aa pe ein ar 


The use that we make of the derivative is the following. Contrary to one’s 
expectation, perhaps, it can happen that, if f(a) is an irreducible polynomial 
over F’, then f can have two equal roots in some larger field. We want to decide 
when this can happen. So first we give a test for repeated roots. 


Theorem 7.35 A polynomial f(x) € F[z] has repeated roots (possibly in an 
extension field of F') if and only if the greatest common divisor of f and Df is 
not 1. 


Remark The greatest common divisor is computed in F'[z] by Euclid’s Algo- 
rithm, as usual. If K is a larger field than F’, we could make believe that this 
calculation was taking place in K[a]; the answer is the same, and lies in F[z]. 
So extending the field does not change the g.c.d. 


Proof Suppose that a is a repeated root of f, so that f(a) = (a — a)?g(x) in 
some extension field of F. Then Df = 2(x — a)g + (x — a)?.Dg. Hence (x — a) 
divides both f and Df, and their g.c.d. is not 1. (By our remark above, we do 
not have to specify the field in the last statement.) 

Conversely, suppose that f has no repeated roots: so, in some extension field, 


f(@) = cla — a1)(a — a2)--- (a — an), 
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where Qj, Q2,...,Q@p are all distinct. Up to a constant factor, any divisor of f is 
a product of some of the factors (a — a;). But D(a — a;) = 1. So the product 
rule for the derivative shows that Df is the sum of n terms, where the ith term 
is c times the product of all (x — a,;) for 7 A 7. Then (x — a;) divides all terms 
except the ith, but does not divide the ith; so (2 —a,;) does not divide Df. Thus 
the g.c.d. of f and Df is 1. 


The characteristic of a field F' is defined as follows: For positive integers 
n, let n- 1 = 1+---+1 (m terms). If there is a positive integer n such that 
n-1=0, then the characteristic is the least such n; otherwise the characteristic 
is zero. More concisely, it is the unique non-negative integer m generating the 
ideal {n € Z:n-1=0} of Z. 


Proposition 7.36 The characteristic of a field is zero or a prime. 


Proof Suppose that the characteristic is n, and that n = rs, with r,s > 1. 
Then 0 =n-1 = (r-1)(s- 1), so either r- 1 = 0 or s- 1 = 0, contradicting the 
minimality of n. 


Theorem 7.37 Let f be an irreducible polynomial over the field F', and suppose 
that f has repeated roots in an extension of F. Then F has non-zero character- 
istic p (a prime), and there is a polynomial g € Fla] such that f(x) = g(a?). 


Proof Let f have repeated roots. Then (f, Df) 4 1. But f is irreducible, so f 
divides Df. This implies that Df = 0; for, if not, then deg(Df) < deg(f), but 
the divisibility implies deg(Df) > deg(f). 

Each term a;x' of f gives rise to a term iajx’~! of Df. Since Df = 0, all of 
these terms must be zero. So, for every 7, either 7 = 0 or a; = 0. (Here 1 = 0 
means that the element i-1 of F' is equal to 0.) 

If F’ has characteristic zero, then 7-1 = 0 only if ¢ = 0; so f is a constant 
polynomial, which contradicts the hypothesis that it is irreducible. So we may 
assume that F has non-zero characteristic p. Now i-1 = 0 only if i is divisible 
by p. So the only terms appearing in f are those a;x* for which 7 is a multiple 
of p. This means that f(x) is a polynomial in 2”, as claimed. (Note that, if 
f(x) = g(a”), then Df = 0.) 


Let F be a field of prime characteristic p. We define the Frobenius map ¢ 
on F' to be the pth power map: cd = c? for all cE F. 


Proposition 7.38 The Frobenius map is an endomorphism of F (a 
homomorphism from F to F). 


Proof We have to show that 


(a+b)? =a? + OP, 


(ab)? = aPb?. 
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The second equation is obvious. For the first, we use the Binomial Theorem: 
P — qgP P p-1 P p—2p2 
(a+b)? =aP + i b+ 5} 4 bv +e + BP. 


The first and last terms give a? + b?, which is what we require. All the inter- 
mediate terms include binomial coefficients car for 1 < i < p—1. Now 
(?) = p!/il(p — i)!, and the numerator (but not the denominator) is divisible 
by p; so p divides (¥), whence (”) -c = 0 for any c € F. The result is proved. 


Since the Frobenius map is a homomorphism, its kernel is an ideal of F’. But 
the only ideals of the field F are F and {0}. Clearly 1¢ = 1, so the kernel is not 
F’. Thus, Ker(¢) = {0}, and ¢ is one-to-one. 

In general, ¢ is not necessarily onto. However, things are much simpler if it is. 
Accordingly, we make a definition: 


Definition The field F is said to be perfect if either: 


(a) F has characteristic zero; or 
(b) F has non-zero characteristic p, and every element of F' has a pth root in F. 


Note that the condition in (b) says precisely that the Frobenius map is onto, 
and hence is an automorphism of F’. 
The connection with repeated roots is as follows: 


Theorem 7.39 Let F be a perfect field. Then an irreducible polynomial over 
F has no repeated roots in any extension field of F. 


Proof Let F be perfect, and suppose (for a contradiction) that f is irreducible 
and has repeated roots. By Theorem 7.37, f(a) = g(x?) for some polynomial 
g. Let 


g(x) = ana” +--+ +a," + ag. 


Since F is perfect, we can choose bo, b1,...,bn such that b? = a; fori =0,...,n. 
Now set 


h(a) = bna” +--+ + dix + bo. 
Since the Frobenius map is a homomorphism, we have 
A(x)? = BaP +... + Ba? + bf = g(x”) = f(a). 


Since f is the pth power of a polynomial in F[a], it is not irreducible, contrary 
to assumption. 


It is quite difficult to find an imperfect field (see Exercise 7.31). By definition, 
all fields of characteristic zero are perfect. Also, the following holds: 


Proposition 7.40 A finite field is perfect. 
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Proof The Frobenius map is always one-to-one; and a one-to-one map from a 
finite set to itself is necessarily onto. 


7.16 Splitting fields. A splitting field of a polynomial f is a ‘smallest’ field 
containing all the roots of f. This is only defined over a ‘base field’ F' containing 
the coefficients of f. For example, if we regard x? + 1 as a real polynomial, its 
splitting field is C; but if we regard it as a rational polynomial, its splitting field 
is the much smaller field Q(i) = {a + bi: a,b € Q}. 

Our goal in this section is to show that splitting fields exist and are unique 
(over a specified base field), up to isomorphism. But, because the base field is 
part of the data, we want to redefine the concept of isomorphism slightly: 


Definition Let K,L be fields containing a subfield fF’. An F-isomorphism 
6: kK — Lisa field isomorphism from K to L which satisfies c@ = c for all 
ce Ff. 


Remark If we regard K and L as F-vector spaces, then an F-isomorphism is 
an F-linear transformation between them. 


Definition Let f be a polynomial of degree n > 0 over F’. A splitting field 
of f over F is a field K containing F’ such that 


(a) f(z) = cla — a,)---(@ — a,) in K[z] (so F' ‘splits’ into linear factors 
over Ix); 

(b) no proper subfield of K containing F' has this property (so K is generated 
by F and the roots a1,...,Q@n of f). 


Theorem 7.41 Let f be a non-constant polynomial over F. Then f has a 
splitting field over F'; and any two such splitting fields are F'-isomorphic. 


Proof It is easy to see that there is a splitting field. For we can adjoin a root 
of an irreducible polynomial to a field, as explained in Section 2.16. Now adjoin 
a root @ of an irreducible factor of f. Over F(a), we have f(x) = (a — a)g(x), 
where deg(g) = deg(f) — 1. Inductively add roots of g until f splits into linear 
factors. Now take the smallest field containing F' and all the roots of f; this will 
be a splitting field. 


To prove the uniqueness up to F-isomorphism, we actually prove something 
which looks much more complicated than this, but is designed to streamline the 
induction: 


Proposition 7.42 Let 0: F — F, be an isomorphism of fields. Let f(x) be a 
polynomial over F, and fi(x) the corresponding polynomial over F, (that is, if 
f(x) = Slax", then f(x) = S>(a;0)2"). Let K and Ky, be splitting fields of f 
and f; over F and F, respectively. Then there is an isomorphism ¢: K — ky 
whose restriction to F is 0. 


Proof We need to know that each step (adjoining one root of an irreducible 
polynomial) produces a unique field up to F-isomorphism. For this, we use the 
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fact that F(a) is F-isomorphic to the ‘standard’ extension F'[a]/(f), by the map 
taking the coset (f) + g to the element g(a). 

It follows that, if 9: F — F, is an isomorphism of fields, f an irreducible 
polynomial over F, f; the corresponding polynomial over F,, and a and a, 
roots of f and f; respectively, then there is an isomorphism from F'(@) to F(a) 
extending @. For @ induces in an obvious way an isomorphism from F'[2] to F;[2], 
which we shall also call 0, and maps f to f1, and hence takes the ideal (f) of 
F [x] to the ideal (f1) of Fy[a]. Thus, 6: F — F, extends to an isomorphism 
6: Flx|/(f) — Fi[z]/(f1). Now we obtain the required map by composing: 


¢ the F-isomorphism from F(a) to F'[a]/(f); 
e 0; 
e the F\-isomorphism from F)[z]/(f1) to Fi(a@1). 


With this technical detail out of the way, the proof of the proposition (by 
induction on the degree of f) is straightforward. Let @: F — F{ be an iso- 
morphism. Let a be a root of f in K, and a, a root of the corresponding 
irreducible factor of f; in Ky. Then, as noted, @ extends to an isomorphism 
wy: F(a) > F\(a1), which maps a to a4. 

Now let f(z) = (« — a)g(x) (in F(a)[x]), and fi(a) = (@ — ay)gi(a) (in 
F(a 1)[z]). Then g and g; are corresponding polynomials under ~, and K and ky 
are splitting fields of g and g; over F(a) and F\(a,) respectively. By induction, 
w extends to an isomorphism ¢: kK — kK; and ¢ extends 0, as required. 

Taking F' = F, and @ to be the identity map, we obtain the theorem. 


7.17 Finite fields. We have seen that finite groups have a rich and varied 
structure, so that we cannot say, even to a very good approximation, how many 
there are of any given order. Finite fields, however, are much more restricted. In 
this section we will give the complete classification of finite fields, due to Galois, 
and investigate some of their properties. 


Theorem 7.43 (Galois’ Theorem on finite fields) The order of a finite 
field is a prime power. 

Conversely, there is a unique finite field of any given prime power order (up 
to isomorphism). 


Proof Let F be a finite field. 

The characteristic of F must be non-zero. For the elements n-1, for n > 0, 
cannot all be distinct; and, if m-1=n-1, with m 4 n, then (m—n)-1=0. 

Let the characteristic be p (noting that p is prime). Now the elements n- 1, 
for n = 0,1,...,p — 1, form a subfield F, of F isomorphic to Z,. (The map 
nt n-1is aring homomorphism from Z to F: its kernel is pZ, by definition of 
the characteristic.) 

Now F is an extension field of F,, and so is a vector space over F,. Clearly, 
it has finite dimension, say n. Then F is isomorphic (as F,-vector space) to the 
space F* of all n-tuples of elements of F,. This isomorphism tells us about the 
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addition in F’, but not the multiplication; so we have more work to do. But at 
least we know that |F'| = p” is a prime power; so the first part of the theorem is 
established. 

We also see that, if a field F' has order a power of the prime p, then it has 
characteristic p, and contains a subfield isomorphic to F,. 

Now we show that a field of order p”, if it exists, is unique. The p” — 1 
non-zero elements of F' form the multiplicative group; by Lagrange’s Theorem, 
cP"-1 — | for any non-zero element c € F. Hence c?” = c for any such c. But 
this also holds for c = 0. We conclude that the polynomial x?” — x has all the p” 
elements of F’ as its roots. So F is a splitting field for this polynomial over F,, 
(any smaller field could not contain all the roots!) By the uniqueness of splitting 
field (Theorem 7.41), F' is unique, up to isomorphism. 

It remains to show that fields of all possible prime power orders exist. This 
can be done by showing that, for any n, there is an irreducible polynomial of 
degree n over F,. However, we now have the machinery in place for a simpler 
proof. 

Let p be prime and n a positive integer. Let F, = Zp, and let F' be the 
splitting field of the polynomial x?” — « over F,. We will show that |F'| = p”. 

We have D(x?” — x) = p"x?"-!—1 = —-1, since the characteristic is p. Hence 
x?” — x is coprime with its derivative, and so it has p” distinct roots in its 
splitting field F’. Let S be the set of these roots. We show that S is a field. By 
minimality of the splitting field, it follows that S = F’, and so that |F| = p”, as 
required. 

Let a and b be roots of 2?” —2; that is, a?” = a and 0?" = b. We have to show 
that a+ b, ab, and (if a 4 0) 1/a, are also roots. For this purpose, we use the 
Frobenius map, which is a homomorphism ¢ of F' defined by cd = c?. Applying 
@ n times, we have cf” = cP”. This is also a homomorphism; so 


(a+b)? =(a+b)d" = ad" + bd" =a?” +d?" =a4+8, 
(ab)?” = (ab)d” = ag"bd" = a?” bP” = ab, 
(1/a)" = (1/a)¢" = 1/(ag") = 1/a”" = 1/a, 


the last equation holding if a 4 0. So a+, ab, and 1/a (if a £ 0) are roots of 
x?” — x, as required. 


The unique field of order p” is called the Galois field GF(p"), after its 
discoverer. (Sometimes the notation F,» is used instead.) 
We prove some structural properties of Galois fields. 


Theorem 7.44 Let p and q be primes, m and n positive integers. 
(a) The additive group of GF(p") is isomorphic to the direct sum of n cyclic 


groups of order p. 
(b) The multiplicative group of GF(p”) is cyclic of order p” — 1. 
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(c) The automorphism group of GF(p") is cyclic of order n, generated by the 
Frobenius map. 
(d) GF(q™) is a subfield of GF(p”) if and only if p= q and m divides n. 


Proof (a) In the proof of Galois’ Theorem, we worked out that GF(p”) is 
additively isomorphic to (Z,)”. This is exactly what is required for (a). 

(b) We will prove a more general result after this theorem. 

(c) The Frobenius map ¢ is an automorphism. (This is a translation of the 
fact that GF(p"), being a finite field, is perfect.) Now ¢” is the identity map on 
F = GF(p"): for 6” maps each element c to c?”, and we showed that c?” = c 
for all c € F. No smaller power of ¢ is the identity: for ¢ maps c to cP”, and 
the equation x?” = x has at most p” roots for m <n. So ¢ generates a cyclic 
group of order n of automorphisms of F’. 

It is harder to show that this is the full automorphism group. This follows 
from a theorem that we will meet when we consider Galois Theory in Chapter 8. 
Here is a more direct proof. By (b), the multiplicative group of F = GF(p”) 
is cyclic, generated (say) by a. Now F = F,(a), since no proper subfield can 
contain a. Since [F’: F,] = n, the element a satisfies a polynomial of degree 
n over F,. Any automorphism of F must map a to one of the n roots of this 
polynomial. Only the identity can fix a, since an automorphism fixing a must 
fix every power of a. So different automorphisms map a to different roots, and 
there are at most n automorphisms. Since we have a group of order n already 
(generated by the Frobenius map), it must be the full automorphism group. 

(d) Suppose that GF(qg™) is a subfield of GF(p”). Applying Lagrange’s 
Theorem to the additive groups shows that q™ divides p”; so p = q (since p 
and q are prime). Applying Lagrange’s Theorem to the multiplicative group 
shows that p™ — 1 divides p” — 1. We claim that this implies that m divides n. 

Let n = mt +r, where 0 < r < m—1. Since x — 1 divides x* — 1 for any 
integer x, we see that p’” — 1 divides p™ — 1, and hence divides p” — p”. It also 
divides p” — 1 by assumption; so it divides p” — 1. But 0 < p" —1 < p™—1; so 
we must have p’ — 1 = 0, whence r = 0 and m divides n. 


Part (b) follows from a more general result: 


Proposition 7.45 A finite subgroup of the multiplicative group of a field is 
cyclic. 


Proof We give two proofs. Both depend on the fact that a field contains at 
most n different nth roots of unity. For an nth root of unity satisfies the equation 
x” —1=0, and this polynomial of degree n has at most n roots. 


First proof This proof uses Theorem 5.14, the structure theorem for finitely 
generated abelian groups. Let G be a subgroup of order n of the multiplicative 
group of a field F. Then 


G=Cua, x Ca, X +++ X Ca, 
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where didj:--d,y = n and d; divides dj;, for i = 1,...,k — 1. (We use 
multiplicative, rather than additive, notation, since the group operation is multi- 
plication.) Let p be a prime dividing d;. Then p divides d; for all 7. So each factor 
Ca, contains a cyclic subgroup of order p. Each subgroup contains p— 1 elements 
of order p. Together with the identity, we obtain (at least) 1+ k(p—1) elements 
of order dividing p. Hence 1 + k(p— 1) < p, whence k = 1 and G is cyclic. 


Second proof This proof is more elementary. Let ~(m) be the number of 
elements of order precisely m in G. Also, let 6(m) be Euler’s function, the number 
of non-negative integers less than m which are coprime to m. We show that: 


(a) Dynjn (70) = 3 
(b) Sonia Bln) =m: 
(c) wim) < o(m) for all m | n. 


It follows that ¢)(m) = ¢(m) for all m | n. In particular, y(n) = ¢(n) > 0. So G 
contains an element of order n, and must be cyclic. 

Proof of (a): We ask, how many non-negative integers k < n have the property 
that the g.c.d. of k and n is n/m, for any divisor m of n? Putting e = n/m, we 
see that the g.c.d. of k/e and n/e is 1; so there are ¢(n/e) = o(m) such integers. 
Summing over m must give n, since all the integers 0,1,...,2 —1 occur. 

Proof of (b): Each element of G has some order which divides n. 

Proof of (c): This is obvious if ~(m) = 0, so suppose not. Then there is an 
element of order m in G. It generates a cyclic group H of order m, containing 
m solutions of x” = 1. So all solutions of «” = 1 lie in H. In particular, all 
elements of order precisely m lie in H. But a cyclic group of order m contains 
exactly ¢(m) elements of order m. (If H = (h), then h! has order m if and only 
if (I,m) = 1.) So ¢(m) = ¢(m) in this case. 


Proposition 7.46 Let q be a prime power. Then the polynomial x” — x over 
GF(q) is the product of all the monic irreducible polynomials over GF(q) whose 
degrees divide n. 


Proof The roots of x?” — x are all the elements of GF(q"). Each of these 
generates GF(q’) for some m dividing n, and hence satisfies an irreducible poly- 
nomial of degree m. Conversely, any root of an irreducible polynomial of degree 
m dividing n generates GF(q™), and hence is contained in GF(q”). 


Example Let q=2andn = 4. There are two irreducible polynomials of degree 
1 over GF(2) = Zo, namely x and «x — 1. There is one irreducible of degree 2, 
whose roots are the two elements of GF(4) \ GF(2); namely, 2? + « +1; and 


a? —9 = a2(2 —1)(27 +a +1). 


There are three irreducibles of degree 4, namely z+ + 2+ 1, 2+ +234 1, and 
xvt+a°+a?+a+1; the product of these three polynomials with x+—- is #!®— x. 


Roots of the third irreducible have order 5, since 


(29 —1)(a* +03 +a? +441) =2°-1. 
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Roots of the other two irreducibles are primitive; that is, they have order 15 and 
generate the multiplicative group of GF(16). 


7.18 Wedderburn’s Theorem. Wedderburn’s Theorem allows us to 
extend our classification of finite fields to a classification of finite division rings 
very cheaply. 


Theorem 7.47 (Wedderburn’s Theorem) A finite division ring is a field 
(that is, it is commutative). 


Proof We need two preliminaries. The first concerns cyclotomic polynomials. 
The nth cyclotomic polynomial ®,,(x) is the unique monic polynomial whose 
roots are the primitive nth roots of unity in C. Since every nth root of unity is 
a primitive mth root for some divisor m of n, we have 


x” -1=]] ®,(2). 


m|n 


By induction, we see that ©®,,(a2) is a polynomial over Z. Its degree is Euler’s 
function ¢(n), the number of congruence classes mod n which are coprime to n. 

The second is a revision of some group theory. Recall from Section 7.1 that 
any group G is a union of conjugacy classes, where g and h are conjugate if 
and only if h = «~!ga for some x € G. The number of elements in the conjugacy 
class of G is |G : Ce(g)|, where Ce(g) is the subgroup 


{x €G:2g= ge}. 
Now let F' be a finite division ring. It is easy to check that the centre 
Z(F) ={x € F: «a =azr for allaec F} 
is a subfield of F’. Moreover, for any a € F, its centraliser 
Cr(a) = {xe F: 2a=azr} 


is a sub-division ring of F. Moreover, F' itself, and any centraliser Cp(a), is a 
vector space over Z(F') (with the given addition and scalar multiplication by 
elements of Z(F')). 

Let |Z(F)| = q, a prime power (since Z(F) is a finite field). Then |F'| = q”, 
where n is the dimension of F as a Z(F')-vector space. Choose representatives 
a1,...,@, for the conjugacy classes (in the multiplicative group) of elements not 
in Z(F’). Suppose that Cr(a;) has dimension m; over Z(F'), so that |Cr(a;)| = 
q’’. Then the centraliser of a; in the multiplicative group of F' has order qg’™ —1, 
so the size of the conjugacy class of a; is (¢” — 1)/(q” — 1). Since every element 
of F'\ Z(F) lies in just one such class, we have the class equation 


Tr g?—1 
%_Jog—1+ 
g-taq-14 yo Eo 
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Now ©®,,(a) divides x” —1 (in Z[z]), and also ®,,(x) divides (~” —1)/(a# —1) 
asm; <n. So, in Z, the non-zero integer ®,,(q) divides g”—1 and (q”—1)/(q"™ — 
1) for each i. It follows from the class equation that ®,,(q) divides q — 1. 

But this is impossible for n > 1. (If n = 2, then ®2(q) = q+1.Ifn > 2, then 
o(n) > 1, and ©,(q) is the product of ¢(n) terms of the form (q — w), where w 
is a primitive nth root of unity; each such factor is larger than gq — 1.) 

So we must have n = 1, whence F = Z(F’) is commutative. 


Exercise 7.31 Define the field of rational functions over a field F to be the field 
of fractions of the polynomial ring F'[x]. (Its elements are of the form f(x)/g(a), where 
f and g are polynomials and g 4 0.) Denote it by F(x). 

Prove that the element x has no pth root in F(x), for p > 1. Deduce that, if F’ has 
characteristic p, then F(x) is imperfect. 

Now let F have characteristic p = 2, and let K = F(x). Show that the polynomial 
y’ — x in K[y] is irreducible and has repeated roots. 


Exercise 7.32 (a) Prove by induction that, if f(x) € F'[z] has degree n and splitting 
field K, then [kK : F] < nl. 
(b) («) Prove by induction that, with the same hypotheses, [K : F'] divides n!. 


Exercise 7.33 Let p be prime. Show that the g.c.d. of p™ — 1 and p” — 1 is p* — 1, 
where k is the g.c.d. of m and n. 


Exercise 7.34 (a) Let an be the number of monic irreducible polynomials of degree n 
over GF(q). Prove that 


S> mam = 9" 


m|n 


Hence, in the case q = 2, calculate a, for n < 6. 

(b) Let by, be the number of monic primitive irreducible polynomials of degree n 
over GF(q) (that is, polynomials any one of whose roots generates the multiplicative 
group). Prove that 


Calculate b, for g= 2 andn <6. 


Other structures 


The title of this section could mean one of two things. So far, we have con- 
centrated on groups, rings, fields, vector spaces, and modules. There are a few 
important types of algebras (though less important than those just listed) which 
have been studied: Lie algebras are perhaps the most notable of these. One 
approach would be a Cook’s tour through some of these. 

Another approach is to look for unifying principles in algebra. The following 
sections do that. First, we examine the notion of an algebra, as a set on which 
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various operations are defined, and make the most general definition possible. 
At first sight, it is surprising how much elementary group theory and ring theory 
can be developed at this level of generality. 

The last two sections are more radical departures. In the second, we examine 
algebras from the viewpoint of their subalgebras and congruences (kernels of 
homomorphisms), independently of the actual operations. Finally, we come to 
the viewpoint that knowing the homomorphisms tells us all about a class of 
algebras; and we do not need to know them as functions, but merely the rule for 
composition. In this way, we find ourselves doing elementary algebra again, but 
having climbed further up the mountain to reach a higher viewpoint. 


7.19 Universal algebra. An algebra is a set carrying various operations. 
Recall that an n-ary operation on a set A is a function w : A” — A. The 
integer n is called the arity of yu. Binary operations are often written with infix 
notation, as we have seen in the case of groups and rings. In general, this is 
not possible, and we write operations on the right, as (7,...,%n)u if w is an 
n-ary operation. Given a family of operations with prescribed arities, we consider 
a type of algebras with these operations. The type is described by the list of 
arities of the operations. In fact, it is possible to dispense with brackets and 
commas and write 21---2%,j; no ambiguity arises (see Exercise 7.38). But we 
will not adopt this convention. 

We allow the possibility of nullary operators (of arity zero); these are 
just distinguished elements, sometimes referred to as constants. The identity 
element of a group, and the zero of a ring, are examples. 

The class of all algebras of given type is unlikely to be interesting. So we 
specialise as follows: A law is an expression w; = We, where w; and wy are 
expressions involving variables and operations, which properly define elements 
of an algebra A when elements of A are substituted for the variables. A law is 
satisfied in A if the equation is valid for all substitutions of elements of A for 
the variables. Now a variety of algebras is the class of all algebras of a given 
type which satisfy a given collection of laws. 

Many classes of algebras we have met are varieties. 


Example Consider the variety of algebras with a binary operator ju, unary 
operator 1, and nullary operator ¢, satisfying the laws 


(zx, YH, Zz) qe = (x, (y, z)M)M, 


This is just the class of groups: py is the group operation, is inversion, and € is 
the identity. Thus groups form a variety of algebras of type (0,1, 2). 

Similarly, abelian groups, rings, commutative rings, rings with identity, and 
so forth, form varieties (Exercise 7.35). 
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Other varieties are less obvious. In Chapter 2, we considered Boolean rings, 
which satisfy the law xx = x. 

A group G is said to be metabelian if it has a normal subgroup N such 
that both N and G/N are abelian. Now a group is metabelian if and only if it 
satisfies the law 


[[z1, v2], [r3, r4]] = 1, 


where the commutator [x,y] is defined to be 2~ty~!xy (Exercise 7.41). So 
metabelian groups form a variety. 


Example Fields do not form a variety: only non-zero elements have multiplica- 
tive inverses, and this fact cannot be expressed as a law. (This does not prove 
that fields cannot be made into a variety by some clever trickery. But we will see 
later that this is so.) 


The set of operators is not necessarily finite. We do not need specially 
contrived examples for this. 


Example For a given field F’,, the class of vector spaces over F' is a variety. 
It has a binary operation (addition), a unary operation (additive inverse), a 
nullary operation (zero), and, for each c € F’, a unary operation (multiplication 
by c). 


Example Let G be a group. A G-set is an algebra with a unary operator jug 
for each g € G, satisfying the laws 


tii = x, 
(Xpg) Hh = ZHgh. 


(The first equation is a law; the second represents one law for each pair (g, h) 
of elements of G.) The G-sets form a variety, which is a familiar one: a G-set is 
just a set on which there is an action of G by permutations (see Section 7.1). 


‘You may have one of two common reactions at this point. One is a feeling of 
freedom, or licence: anything goes. Indeed, mathematicians have studied a very 
wide variety of varieties, or closely related structures: for example, semigroups, 
quasigroups, partial groups, loops, sloops, squags, semirings, alternative rings, 
Lie rings, near-rings, planar ternary rings, Lie algebras, Jordan algebras, Boolean 
rings, quasifields, near-fields, near-domains, ... 

The other reaction is vertigo at the wide range of subject matter opened up 
by this definition. But, in all the above cases, there is a good mathematical reason 
for considering the class of algebras. I know of no instance where someone wrote 
down a set of axioms out of the blue and invented a lively and important theory. 
Axioms for a class of algebras always follow the introduction of the class for 
other reasons. Each of these classes played some role in mathematics before its 
axiomatic definition: Lie algebras in differential geometry, planar ternary rings 


Further topics 281 


in the theory of projective planes, Boolean algebras in logic, Jordan algebras in 
physics, and so on. 

A strength of the universal viewpoint in algebra is that many arguments 
recur in similar form in different topics, and it is more efficient to do them once 
in the most general context. We remarked in Chapter 3 that beginning group 
theory (subgroups, homomorphisms, and so on) duplicates similar parts of ring 
theory. In fact, the arguments work much more generally. We learn something 
both from the generality of the arguments and from the modifications needed. 


Definition Let A be an algebra of a given type. A subalgebra of A is a subset 
which is closed under all the operators of A. 


In the case of nullary operators, this asserts that the subalgebra must contain 
all the constants of the algebra. It is clear that a subalgebra of A is an algebra of 
the same type. Moreover, any law that is valid in A also holds in a subalgebra. 
Hence: 


Proposition 7.48 Jf an algebra A belongs to a variety V, then so does any 
subalgebra of A. 


A homomorphism @: A — B, where A and B are algebras of the same 
type, is a map from A to B satisfying 


(a1,---,@n) ad = (a10,..., dn) UB 


for all ay,...,@, € A, for all n-ary operators ju, and for all n. (In this equation 4 
and j1p are the operators on A and B which correspond. In future, we will adopt 
the practice we have used for groups, rings, and every other kind of structure, 
and suppress the subscripts.) Now the image of @ is a subalgebra of B. Moreover, 
any law which holds in A also holds in the image of @. 

For rings, groups, vector spaces, and modules, we defined the kernel of a 
homomorphism to be the inverse image of the identity, and showed that it is a 
subalgebra of A. In general, we cannot do this, since there may be no ‘identity’— 
our algebras may have no constants, or several, and even if they exist they may 
not have appropriate properties. 

The clue is the general definition of kernel in Chapter 1, as a partition (two 
elements in the same part if they have the same image). In the above special 
cases, this is the partition into cosets of the simpler ‘kernel’ (the inverse image 
of the identity). In general, we just work with the partition. 


Definition The kernel of a homomorphism @ is defined as the equivalence 
relation KER(@) which is given by the rule that (2, y) € KER(6) if and only if 
xO = yO. 


Although none of the parts of this partition may be a subalgebra, it still has 
a property generalised from the coset partition of a normal subgroup. 
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Definition A congruence on an algebra A is an equivalence relation EF on A 
with the property that, for any n-ary operation yp, if (a;,b;) € E fori =1,...,n, 
then 


(a1, 3o% 1 On) [sy (b1, ey - On) M) eB. 


Now it is clear that the partition KER(@) defined by a homomorphism is a 
congruence on A. 


Definition Given a congruence FE on A, the factor algebra A/F is defined 
as follows: the elements of A/F are the classes of FE; and, if [a] denotes the 
congruence class of a, then 


(lai), -- +5 [an] = [Cais +++ On) 


That this is independent of the choice of representatives follows immediately 
from the definition of a congruence. The map a + [a] from A to A/E is a 
homomorphism whose kernel is £ and whose image is A/E. Now we have all the 
ingredients for the First Isomorphism Theorem: 


Theorem 7.49 (First Isomorphism Theorem) Let 0: A— B be a homo- 
morphism. Then: 


(a) Im(@) is a subalgebra of B; 
(b) KER(@) is a congruence on A; 
(c) A/ KER(6) = Im(@). 


We note that this theorem works in the class of all algebras of given type, or 
in any variety of algebras. The other isomorphism theorems also generalise, but 
we do not pursue this here. 

We have seen that varieties are closed under taking subalgebras and factor 
algebras. They have another closure property as well. 


Definition Let J be a set, and suppose that, for each 7 € I, we are given an 
algebra Aj, all of these algebras having the same type. The Cartesian product 
I],e, Ai is defined to be the set of all functions f : I > Uj; Ai satisfying 
f(t) © A; for all i € I. (These functions are choice functions for the family 
(A; : i € I) of sets. The Axiom of Choice guarantees that the Cartesian product 
of a family of non-empty sets is non-empty. For this reason, it is sometimes called 
the ‘multiplicative axiom’.) For each n-ary operation yz in the type, we define 
(fi,---; fn) to be the function given by 


(fis PM = (f1@,-- +s fn(4)) 


Note that fi(2),..., fr(é) are elements of the algebra A;, so we can apply to them 
the operation ps on this algebra. It is again easy to see that, if all the algebras 
A; belong to a variety V, then so does their Cartesian product. 
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This enables us to show that fields, integral domains, and so on do not form 
varieties: the Cartesian product of two fields is not even an integral domain. 
(Representing a function on a 2-element set as an ordered pair of values as 
usual, we see that 


(a, 0)(0, b) = (0, 0), 


so (a,0) is a zero-divisor if a 0.) 
Remarkably, it turns out that these three closure properties characterise 
varieties: 


Theorem 7.50 A class of algebras (with a fixed set of operators) is a variety if 
and only if it is closed under isomorphism and under taking subalgebras, factor 
algebras, and Cartesian products. 


7.20 Lattices. A lattice is an algebra with two binary operations, V (‘join’) 
and A (‘meet’), and two constants, 0 and 1, satisfying the following axioms: 


Idempotent laws: 7Vx=xr=2Az. 

Commutative laws: 7 Vy=yVaeand&rAy=yAzu. 

Associative laws: (x Vy) Vz=2V(yVz) and (a@Ay)Az=a2A(yAz). 
Identity laws: ¢ VO=ax=a2A1. 

Absorptive laws: cA (xVy)=x#=a2V(aAy). 


Note that these axioms are unchanged under the exchange of V and A, and 
0 and 1. In this way, from any lattice LZ we obtain another lattice L*, the 
dual of L. 

Where do lattices come from? We will use them to describe the subalgebras, or 
the congruences, of an arbitrary algebra. But there is a more basic fact underlying 
this: a lattice is a special kind of partially ordered set. Recall that a partial 
order on a set X is a relation < which is reflexive (x < x), antisymmetric 
(a <y and y < x imply x = y), and transitive (a < y and y < z imply x < z). If 
(X, <) is a partially ordered set, and x,y € X, we say that u is a least upper 
bound, or supremum, of z and y if: 


(a) c<uandy <u; 
(b) ifa<vandy<vthenu<v. 


Note that a least upper bound, if it exists, is unique; for if u and wu’ are both 
least upper bounds, then u < wu’ and u’ < u, whence u = u’. Dually, a greatest 
lower bound, or infimum, is an element w such that: 


(a) w<avandw <y; 
(b) ifz<aandz<ythenz<w. 


Again, if it exists, it is unique. A least element 0 satisfies 0 < x for all a; if it 
exists, it is unique. Similarly for a greatest element. 
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Theorem 7.51 Let (X,<) be a partially ordered set. Suppose that 


(a) X has a least element 0 and a greatest element 1; 
(b) any two elements x,y have a least upper bound x V y and a greatest lower 
bound x Ay. 


Then (X,V,A,0,1) ts a lattice. 

Conversely, let (X,V,A,0,1) be a lattice. Setxu <y ifxVy =y. Then (X,<) 
is a partially ordered set satisfying conditions (a) and (b) above. 

Moreover, the constructions above are mutually inverse. 


Proof Starting with a partially ordered set with properties (a) and (b), we 
verify the lattice axioms. This is mostly straightforward. In the commutative 
laws, both sides represent the least upper bound (or greatest lower bound) of x 
and y. In the absorptive laws, « < (a V y), so the greatest lower bound of w and 
rVyis &. 

Conversely, suppose that we are given a lattice. Note first that « Vy = y if 
and only x A y = 2, so either can be chosen as the definition of « < y. This 
follows from the absorptive laws: if « Vy = y, then 


cAy=axA(“Vy) =z. 


We show that the relation < is a partial order. Reflexivity x < x follows from 
the idempotent law 7 V x = x. Antisymmetry follows from the commutative law: 
ifeVy=yand yV «=a, then x = y. Transitivity follows from the associative 
law: ife#Vy=y and yV z =z, then 


eVze=2“2V(yVz)=(“@Vy)Vze=yVe=z. 


Finally, we have to show that the constructions are inverse. Suppose that we 
are given a lattice which arises from a partially ordered set. Then, if x < y, the 
least upper bound of x and y is y, so x V y = y, and conversely. So the partial 
order is uniquely determined. 

In the other direction, we are given a partially ordered set arising from a 
lattice. We have to show that 0 and 1 are least and greatest elements, and that 
xVy and «Ay are the least upper bound and greatest lower bound of x and y. 
The first assertions are trivial: the identity laws 0 Vx =a and 1A a =z imply 
that 0 < x and x < 1 for all x. For the second, we have 


eV(eVy)=(a@Va)Vy=2Vy, 


sox <aVy. Similarly, y < «Vy. Ife < v and y < v, then Vv = v and 
y Vv =v; 80 


(aVy)Vu=aV(yVv)=2Vv=v, 


whence x V y < v. Thus x V y is the least upper bound. The argument for the 
greatest lower bound is dual. 
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Remark Ifa lattice L arises from a partially ordered set (X,<), then its dual 
L* is obtained from the partially ordered set (X,>) obtained by reversing the 
order. 


Remark A finite lattice (or partially ordered set) can be represented by a 
Hasse diagram in the plane. The points of the lattice are represented by 
points in the plane, so that, if a < b, then the point 6 is higher (that is, larger 
Y-coordinate) than a. We join a to b by a line segment if b covers a; that is, 
a < b but no element c satisfies a < c < b. An example is shown in Figure 7.1. 
Check that it is a lattice. Note that the order relation can be read off from the 
covering relation: if a < z, then there is a chain a <b <... < z, each term 
covering the one before. 


There are two very important examples of lattices, which we now describe. 


Example The subset lattice of a set S: the elements are all subsets of S, 
and the partial order is inclusion. Thus, x Vy = xUy,rAy=aNy,0=9, and 
1 =S. The lattice shown in Figure 7.1 is the lattice of subsets of a 3-element set. 


These lattices have some additional properties, notably the following: 
Distributive laws: «\(yVz) = (aAy)V(aAz) and «V(yAz) = (aVy)A(aVvz). 


Any lattice satisfying these laws is called a distributive lattice. Clearly, 
any sublattice of the subset lattice is distributive. The converse also holds, at 
least for finite lattices: 


Theorem 7.52 Any finite distributive lattice is isomorphic to a sublattice of 
the subset lattice of a finite set. 


In the infinite case, some additional properties are required in order to obtain 
a similar characterisation. 


Fig. 7.1 A Hasse diagram 
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Note also that the dual of a distributive lattice is distributive. 


Example Let A be an algebra (of some given type). Then the subalgebras of 
A form a lattice, the subalgebra lattice of A. The meet of two subalgebras 
is their intersection (which is always a subalgebra). The join of subalgebras By 
and Bg is usually not the union, which is not a subalgebra, but is, rather, the 
subalgebra generated by B, and B2, which can be described as the intersection 
of all subalgebras containing them, or as the smallest set containing B; and Be 
and closed under the operations of A. The subalgebra 1 is A, while 0 is the 
unique minimal subalgebra (the subalgebra generated by the constants). 


Example The partition lattice of a set S is defined as follows. The elements 
are the partitions of S (which we can regard as the equivalence relations on S.) 
For partitions 7; and 72, we take 7, < qo if 7, refines 72, in the sense that 
any part of 71 is contained in a part of 72. (If we regard a partition as being 
an equivalence relation—that is a certain set of ordered pairs—this is just the 
inclusion order on P(S x S).) So m1 A 7m is the partition whose parts are all 
non-empty intersections of parts of 7, with parts of m2. The partition 7, V 7» 
is more difficult to describe: it is not just the union of the set of pairs, or the 
partition whose parts are all unions of parts of 7; and 79. Instead, join two points 
of S by an edge if they lie in the same part of either 7, or 72; then the parts of 
m™ V m2 are the connected components of this graph. The partition 0 is the one 
with singleton parts (the relation of equality), while the partition 1 has just one 
part, namely S. 


Example More generally, if A is an algebra, then the congruences on A form a 
lattice, the congruence lattice of A. The order is as in the partition lattice, and 
the meet, 0, and 1 elements are the same; but the join 71 V 72 of two congruences 
m, and 72 must be taken as the meet of all those congruences that are coarser 
than both. 


Proposition 7.53 The congruence lattice of a group, ring, or vector space is 
isomorphic to a sublattice of the subspace lattice. 


Proof In the case of a group, the classes of a congruence are the cosets of a 
normal subgroup; and the meet or join of two normal subgroups is a normal 
subgroup. So the normal subgroups form a sublattice of the subgroup lattice 
isomorphic to the congruence lattice. The arguments in the other cases are 
similar. 


Remark In an abelian group, every subgroup is normal, and so the subgroup 
and congruence lattices are isomorphic. The same applies to a vector space, or 
a ring (such as Z), in which every subring is an ideal. 


We now consider two classes of lattices, the first more special than distributive 
lattices, and the second more general. 
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Boolean lattices form a special class of distributive lattices. In any lattice 
L, a complement of x is an element 2’ such that 7 V 2’ = 1 and xAz’ = 0. 
Complements may fail to exist, and they may not be unique. However, in a 
distributive lattice, an element has at most one complement. For, if x’ and 2* 
are complements of x, then 


PSaPALHEC NEVE) =e Aan Vv ae As) HOVE Aw) =a Ax, 


so x’ < a*. Similarly, x* < 2’; so 2’! = x*. 

Note that 0 and 1 are complements of each other; and, if y is a complement 
of x, then x is a complement of y. 

A Boolean lattice is a distributive lattice in which each element has a 
complement (necessarily unique). 

It is usual to take complementation as an operation. This gives us the fol- 
lowing definition. A Boolean lattice is a set X with operations V,A,0,1,’ (of 
arities 2,2,0,0,1) such that: 


(a) (X,V,A,0,1) is a distributive lattice; 
(b) eVa’=landaAa’=0. 


The subset lattice of any set S is a Boolean lattice: complementation is given 
by 2’ = S'\ x. These examples are typical: 


Theorem 7.54 A finite Boolean lattice is isomorphic to the lattice of subsets 
of a set. 


This can be deduced from the representation theorem for distributive lattices, 
or can be proved directly (Exercise 7.44). 

Now we turn to a larger class. A lattice is modular if it satisfies the following 
condition: 


Modular law: If x < z, then eV (yA z) =(@Vy) Az. 


This property is weaker than the distributive law. It is self-dual, in that a 
lattice satisfies the modular law if and only if its dual does (interchange x and 
z in the statement to obtain its dual). As written, it is not a law, but it can be 
converted into one by noting that « < z if and only if « Vu = z for some u; so 
the modular law can be written 


eV (yA(a@Vu)) =(a@Vy)A(aVu). 


So modular lattices form a variety. In particular, a sublattice of a modular lattice 
is modular. The connection of the words ‘module’ and ‘modular’ is explained in 
the first part of the next result. 


Theorem 7.55 (a) For any ring R, the submodule lattice of an R-module is 
modular. 
(b) The congruence lattice of a group is modular. 
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Remark In particular, this applies to the subspace lattice of a vector space, 
or the subgroup lattice of an abelian group (a Zmodule). 


Proof (a) Let X,Y,Z be submodules of the R-module M, with X < Z. The 

join of two submodules is their sum. We have X + (YNZ) < (X +Y)NZ, since 

this holds in any lattice. Take z € (X + Y)N Z. Then z € Z, andz=2+y, 

with xe X,yeY.Sincex € X < Z, we have y= z-—2E Z,soye YNZ, and 

z=a+yeEX+(YNZ), as required. 
(b) The argument is similar. 


In Theorem 7.52, we saw that every finite distributive lattice is embeddable 
in the lattice of subsets of a set. There is no similar theorem for modular lattices, 
since there is no one modular lattice which is sufficiently general to embed the 
subspace lattices of all finite vector spaces, for example. Nevertheless, there is 
an important characterisation theorem for a subfamily of the modular lattices, 
which we now state, after a few definitions. 

An atom in a lattice is a minimal non-zero element. Thus, x is minimal if 
x #0 but y < x implies y = x or y= 0. A lattice is atomic if every point is the 
join of a finite number of atoms. The rank of an element x in an atomic lattice 
is the smallest number of elements whose join is 7; and the rank of the lattice 
is the rank of 1. 

A line is a lattice consisting of 0, 1, and a set A of atoms, with |A| > 1. It is 
proper if |A] > 2. See Figure 7.2. Note that a line with two atoms is isomorphic 
to the direct product of the lattice {0,1} with itself. (If we permitted a line to 
have just one atom, such a line would not be atomic.) 

A projective plane is an atomic modular lattice of rank 3 with the property 
that the meet of any two elements of rank 2 is an atom. (Alternatively, it is an 
atomic modular lattice of rank 3 whose dual has the same properties.) It is 
proper if any element of rank 2 is above at least three atoms. Calling atoms 
points and elements of rank 2 lines, we can draw geometric diagrams rather 
than Hasse diagrams: see Figure 7.3. The points in this diagram are the atoms 
of the lattice, and the six straight lines and one circle define seven sets of three 
atoms corresponding to the lattice elements of rank 2; the 0 and 1 of the lattice 
do not appear in the diagram. 


Theorem 7.56 Let L be an atomic modular lattice. Then L is isomorphic to 
the direct product of a finite number of lattices of the following form: 


Fig. 7.2 A line 
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Fig. 7.3 A projective plane 


(a) L = {0,1}; 

(b) a proper line; 

(c) a proper projective plane; 

(d) the submodule lattice of a finitely generated module over a division ring. 


Remark By Wedderburn’s Theorem, a finite division ring is a field. So for 
finite lattices, case (d) of the theorem becomes the subspace lattice of a finite- 
dimensional vector space over a finite field. 


7.21 Category theory. Category theory, sometimes dismissed as ‘abstract 
nonsense’, began in a very technical part of mathematics, algebraic topology. 
It has developed into an alternative foundation for the whole of mathematics (in 
the most extreme form), and certainly a unifying principle which is algebraic in 
nature but much wider in scope. 

The underlying philosophy is that what is important about any class of math- 
ematical structures is the structure-preserving maps between different objects in 
the class. For example, suppose that our structures are just sets. If f: X — Y 
and g: Y — Z are maps between sets, then there is a composite fg: X — Z. 
Moreover, f is one-to-one if and only if there is a map g: Y — X with 
fg = 1x (where 1x is the identity on X); and f is onto if and only if there 
exists h: Y — X with hf =1y. 

Similarly, other set-theoretic notions can be recognised. Here are some more 
examples. The associative law for groups is usually stated as a law (in the sense 
of universal algebra), asserting the equality of two expressions (ab)c and a(bc). 
Another version involves the maps A, and pq defined by left and right multipli- 
cation by the element a: the associative law asserts that A, and p. commute for 
any a,c€G: 


braPe = (ab)c = a(bc) = bpeXa. 


This version uses a mixture of elements and maps. But the law can be stated 
using only maps. Let wu: Gx G — G be the group operation. If a; : G; — H; are 
maps for i = 1,2, then a, x a2: Gy xX Gy — Hy x Hz is defined coordinatewise. 
Now the associative law asserts that 


(1x pw) = (ux 1)p, 
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where 1 is the identity map on G. (The left-hand side maps (a, b,c) > (a, bc) 
a(bc), and the right-hand side (a, b,c) + (ab,c) + (ab)c.) 

In fact, we can define the Cartesian product of two sets using maps. Let X 
and Y be sets. The Cartesian product X x Y is a set P which has ‘projections’ 
m, and 72 to X and Y respectively (by taking the first and second coordinate of 
each ordered pair). Moreover, if Z is any set and ¢; : Z > X and ¢2: Z > Y 
any maps, then there is a map w : Z — P such that wa; = ¢; for i = 1,2. 
(Set z~ = (z¢1, 2¢2).) This property characterises the Cartesian product, up to 
isomorphism. Moreover, exactly the same properties and characterisation hold if 
we replace sets, maps, and Cartesian products by groups (or various other kinds 
of structures), homomorphisms, and direct products. 

For a final example, a basis X of a vector space V over F is a linearly 
independent spanning set. However, bases are also characterised (and could be 
defined) by the following mapping property: any map from X into an F-vector 
space W can be uniquely extended to a linear map from V to W. 

These examples give some insight into the viewpoint of category theory. The 
general definition is as follows: 

A category consists of the following data: 


e A set O of objects. 

A set M of morphisms or arrows. 

e A pair of functions, dom (domain) and cod (codomain), from M to O. 

e For each x € O, an identity morphism 1,. 

A partial operation of composition on M, the composition of f and g (if 
it exists) being written fg. 


It satisfies the following axioms: 


The composition fg exists if and only if cod(f) = dom(g). If this holds, then 
dom(fg) = dom(f) and cod( fg) = cod(g). 

e If fg and gh are both defined, then (fg)h = f(gh). 

e dom(1,) = cod(1,) = x. 

e If dom(f) = a and cod(f) = y, then 1, f = f = fly. 


We abbreviate the information dom(f) = x and cod(f) = y by writing f : x — y. 

Part of the philosophy of category theory is that morphisms are more impor- 
tant than objects. In fact, a category can be defined using only the morphisms, 
the partial composition, and the identity morphisms; we identify the objects with 
their corresponding identity morphisms. See Exercise 7.56. 

There are two, quite different, sources of examples of categories. Be careful 
to distinguish these, although the strength of category theory is that really no 
distinction needs to be made. 


Classes of structures Let O be a set of mathematical structures of some 
type. These may be universal algebras of a fixed type (such as groups, vector 
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spaces over a given field, lattices). They may also, more generally, be topological 
spaces, differentiable manifolds, algebraic curves, and so on. 

Let M be the class of all structure-preserving maps between members of O. 
(For algebras, we take MW to consist of all homomorphisms; for topological spaces, 
all continuous maps; and so on.) For f € M, we take dom(f) and cod(f) to be 
the usual domain and codomain of f, and take composition to be the usual 
composition of functions and 1, to be the identity map on «x. 

It may also be possible to obtain a category by taking just some of the struc- 
ture preserving maps. For example, we could take just the one-to-one maps, or the 
onto maps; for differentiable manifolds we could take the continuous functions, 
the differentiable functions, the smooth functions ... 

A category of this sort, where the objects are sets (possibly with additional 
structure) and the morphisms are functions, is called a concrete category. 


Remark There is a set-theoretic point which has to be mentioned here, 
although I will not elaborate on this. We want to consider the category of all 
groups, for example. But the class of all groups is not a set; if it were, we 
could not escape problems in the foundations of set theory, such as Russell’s 
paradox. One way round this is to suppose that there is a very large ‘universal’ 
set U, in which all constructions which we want to perform can be made (some 
models of set theory provide such a set), and to consider the set of all groups 
which belong to U. If this remark means nothing to you, you may ignore it; if it 
intrigues you and you would like to know more, read a textbook on set theory 
and one on category theory and compare their approaches. 


Individual structures It may surprise you to learn that a group G is an 
example of a category. Take a single object called * (say), and take G to be 
the set of morphisms, with dom(g) = cod(g) = * for all g € G, and 1, = 1, 
the identity of G. Since there is only one object, any pair of morphisms can be 
composed. 


With this example in mind, we could say that categories form just another 
type of algebraic object, more general than groups. 

But now we have the option of turning the generality on itself. There is a 
category of all categories! (modulo the set-theoretic difficulties just discussed). 

Categories are more general than groups in two respects: there can be more 
than one object, and morphisms need not be invertible. Two intermediate classes 
of structures are obtained by relaxing one or other of these conditions. 

A groupoid is a category in which any morphism f : x — y has an inverse 
g:y— a (such that fg = 1, and gf =1,). An example is obtained by taking 
any class of structures as objects, and the isomorphisms as morphisms. 

A monoid is a category with a single object. In other words, it is a set with 
a (total) operation of composition, satisfying the associative and identity axioms 
for a group, but not necessarily the inverse axiom. Thus, the endomorphisms of 
a single structure x (the homomorphisms from x to x) form a monoid. 
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For convenience, we will in future say ‘Let C = (O, M) be a category’, mean- 
ing that O and M are the sets of objects and morphisms of C’. Of course, 
the notation ignores part of the category structure (the composition and the 
identities), but old habits die hard. 

Let C = (O, M) be a category. A subcategory C” consists of a subset O’ of 
O and a subset M’ of M such that C’ = (O’, M’) is a category. An important 
special case occurs when M’ consists of all the morphisms of C whose domain and 
codomain lie in O’. In this case, C’ is called a full subcategory. For example, 
abelian groups form a full subcategory of the category of groups. 

The philosophy of category theory is that morphisms carry the essential infor- 
mation about objects. Naturally enough, we next define ‘morphisms between 
categories’. 

Let C = (O,M) and C” = (O', M’) be categories. A functor from C' to C’ 
consists of a pair of maps (denoted by the same symbol F’) from O to O’ and 
from M to M’, satisfying the following conditions: 


e (fg)F = (fF)(gF) whenever fg is defined; 
e 1,F = lyr for all x € O. 


Note that the map on morphisms (‘functions’) is an important part of a functor, 
not just an appendage of the map on objects; the name ‘functor’ is intended to 
suggest this. 

Functors are very common in mathematics. For example, let C be some con- 
crete category of algebraic structures, and let C’ be obtained by ignoring some of 
the structure. Then there is a forgetful functor from C to C’. For example, let 
C and C” be the categories of rings and abelian groups. Then the forgetful func- 
tor F maps a ring to its additive group, and a ring homomorphism to the same 
map (regarded merely as a group homomorphism). In particular, any concrete 
category has a forgetful functor to a category of sets. 

Here are some further examples. 


Derived group A different kind of functor maps groups to abelian groups. 
Let G be a group. The derived group G’ is the subgroup generated by all 
commutators g~th~'gh; it is the smallest normal subgroup with abelian factor 
group (see Section 6.1.4). Now there is a functor from groups to abelian groups 
which maps G to G/G’. Of course, we have to define the action of the functor 
on morphisms (see Exercise 7.60). 


Unit group The functor U, from the category of rings with identity to the 
category of groups, maps a ring to its group of units. (Check that a ring homo- 
morphism maps units to units and induces a group homomorphism on the group 
of units.) 


General linear group More generally, for any n, the functor GL, maps a 
ring R with identity to the group GL(n, R) of invertible n x n matrices over R. 
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Power set The power set operation defines a functor from the category of sets 
to itself. 


Homology In algebraic topology, one defines, for each positive integer n and 
topological space X, an abelian group H,,(X), the nth homology group of 
X. Homeomorphic spaces have isomorphic homology groups, and homology was 
originally a tool for telling topological spaces apart. It turns out that continuous 
maps between spaces induce homomorphisms between their homology groups. 
So H,, is a functor from topological spaces to abelian groups. 


Group actions Recall that any group G is a category with a single object *, in 
which the morphisms are the group elements. What is a functor F' from G to the 
category of sets? «F is a set Q. For all g € G, gF is a map from 2 to Q, such that 
(9192)F = (91 F)(goF) and 1F is the identity map on 2. This is precisely the 
definition of a permutation action of G on 2 (see Axioms (GA1) and (GA2) in 
Section 7.1). So functors from G to sets are permutation representations (actions) 
of G. 

More generally, functors from G to any category C' of algebras are actions 
of G by automorphisms of an algebra in C. For example, if C’ consists of finite- 
dimensional vector spaces over F’, then a functor from G to C is a representation 
of G by matrices over F'’. 


We now move to the next level in this process. A natural transformation is 
a homomorphism between functors: with each object, it associates a morphism 
between the images of the object under the two functors. More precisely, let 
F,G:C—C" be functors, where C = (O, M) and C’ = (O’, M’). A natural 
transformation T : F — G is a function from O to M’ with the following 
properties: 


e for any xz € O, dom(2T) = xF and cod(#T) = 2G; 
e for any f € M, with dom(f) = x and cod(f) = y, we have 
(fF)(YT) = (aT)(fGE). 


Note that fF: «F —- yF, yT : yF — yG; and «T: «F — 2G, fG:«G — yG, 
so the composite morphisms on both sides are defined. The condition can be 
represented by a commutative diagram as follows: 


ak — yP 
T | L yT 
aG 4% yG 


This means that, if we start from an element of «Ff and map it to an element 
of yG by following the arrows along either possible route, the result will be the 
same (independent of the route taken). 

It is probably still not clear what this definition means. In fact, mathematics 
abounds in important examples. Here are a few. 
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Determinant Let C be the category of commutative rings with identity, C’ 
the category of groups. As a small specialisation of an earlier example, both U 
(the group of units) and GL, (the group of invertible n x n matrices) are functors 
from C' to C’. We claim that det (determinant) is a natural transformation from 
GL,, to U. The first assertion of the definition is that, for any commutative ring 
R, det is a homomorphism from GL(n, R) to U(R): this, as we have seen, is a 
fundamental property of the determinant, namely 


det(AB) = det(A) det(B). 


The second property connects this with ring homomorphisms. If f: R—- Sisa 
homomorphism, we have 


(det(A)) f = det(Af), 


where f denotes also the induced maps GL,,(R) — GL,(S) and U(R) > U(S) 
(which we might, more consistently, call GL,(f) and U(f)). 


Group actions Let G be a group. We saw that a functor from G to the cate- 
gory of sets is just a permutation action of G on aset 2. A natural transformation 
between two such functors (actions on sets Q; and Q2) is a G-homomorphism 
between two such actions: that is, a map T : Q, — Q2 such that (ag)T = (aT)g 
for alla EQ). 


Double duals The dual space of an F-vector space V is the vector space V’ 
of all linear maps from V to F’. Duality is not a functor as we have defined it, 
since it ‘reverses arrows’; that is, if f : V — W is linear, then f’ : W’ > V’ is 
defined by 


vo(of') = (vf)¢ 


for ¢ € W’, v € V. (Duality is what is known as a contravariant functor.)If 
D denotes duality, then D? is a functor from the category of vector spaces to 
itself. Also, there is a natural transformation T from the identity to D?: VT is 
the map V — V” under which the image of v € V is the map ¢ > v@ from V’ 
to F. 

This makes precise the notion that there is a natural embedding of a space 
into its second dual, independent of any choice of basis. If we want to embed a 
space in its dual, we must make some choices: the embedding is not ‘natural’. 


Exercise 7.35 Show that the classes of rings, commutative rings, rings with identity, 
and abelian groups are varieties. 


Exercise 7.36 Let V be a variety of algebras with two unary operations a; and a2 
and a binary operation (3, satisfying the laws 


(x, y)Boa = x, (x, y)Ba2 = y, (201, 202) 3 = z. 


Further topics 295 


Prove that any algebra in V with more than one element is infinite. 


Exercise 7.37 A quasigroup is an algebra with three binary operations p, A, p 
satisfying the laws 


(z, (a, y)M)A =Y%, (z, (a, y)A) eu =%, 
(ay) Wp = 2, ((2,y)p, yu = @. 


(a) Show that, in a quasigroup, the three equations (x,y) = z, (@%,z)A = y, 
(z,y)p = «x are equivalent. (Thus, A and p are ‘left and right division’ with respect 
to the ‘multiplication’ jj.) 

(b) Show that the operation table of a binary operation ~ on A has the property 
that each element occurs exactly once in each row or column if and only if there are 
operations A and p such that the three operations p, A, p define a quasigroup on A. 

[A table with the property described in this problem is a Latin square.| 


Exercise 7.38 (x) This exercise shows that, in an algebra of given type, we can write 
elements unambiguously without needing brackets. 

Consider an alphabet consisting of a set of variables and a set of operation symbols, 
each operation symbol having a given non-negative arity. Define the variability of a 
word in this alphabet to be the integer obtained by subtracting from its length the sum 
of the arities of the operation symbols that it contains. A prefix of a word is obtained 
by deleting any number of symbols from the end of the word. Show that a word w 
represents a (unique) element of an algebra (after substituting elements of the algebra 
for the variables) if and only if the following two conditions hold: 


(a) w has variability 1; 
(b) every non-empty prefix of w has positive variability. 


Devise a decoding algorithm which tests whether a word satisfies this condition and, 
if so, parses the word. 


Exercise 7.39 (xx) Let A be the set of all words satisfying conditions (a) and (b) 
above. For each n-ary operator symbol yp, and any n elements ai,...,@n € A, show 
that a1---dnps € A. Hence show that A is an algebra with the given collection of 
operators. Show that, if B is any algebra of the same type, and we choose an element 
b; € B corresponding to each variable x;, then there is a unique homomorphism from 
A to B which maps x; to b; for each 7. 

[A is called the free algebra of the given type with the given set of variables as 
generators.| 


Exercise 7.40 Does the family of unique factorisation domains form a variety? 


Exercise 7.41 In a group G, the commutator [z, y] of elements x and y is the element 
—1,-1 
Ly xy. 


(a) Show that [a, y] = 1 if and only if x and y commute. 

(b) Hence show that G is abelian if and only if it satisfies the law [x1, 22] = 1. 

(c) Let N be a normal subgroup of G such that G/N is abelian. Show that 
all commutators [x,y] belong to N. Hence show that, if also N is abelian (so that 
G is metabelian), then any two commutators commute, and G satisfies the law 
[[v1, x2], [x3, ral] =1. 

(d) Show that the subgroup H generated by all commutators in G (the derived 
group or commutator subgroup of G) is normal, and that G/H is abelian. 
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(e) Show that, if G satisfies the law [[x1, v2], [v3, x4]] = 1, then its derived group is 
abelian, and hence G is metabelian. 

*(f) Generalise the above to show that soluble groups of derived length at most d 
form a variety. 


Exercise 7.42 A congruence on an algebra A is an equivalence relation on A; that is, 
a subset FE of A x A. Prove that it is a subalgebra of A x A. Is every subalgebra of 
Ax Aacongruence? If not, can you formulate necessary and sufficient conditions? 


Exercise 7.43 Let X be a Boolean lattice. Define new operations + and - on X by the 
rules xt+ty=xVyA(rAy)', c-y=xAy. Prove that (X,+,-) is a Boolean ring (a ring 
satisfying the law 2” = x). Show also that any Boolean ring gives rise to a Boolean 
lattice by taking x Vy=a+yt+ay,trAy=cay. 

Prove that the categories of Boolean lattices and Boolean rings are naturally 
isomorphic. 


Exercise 7.44 (x) In this exercise, we show that a finite Boolean lattice is isomorphic 
to the lattice of subsets of a set. 


We must construct a set S, and a subset s(x) of S for each x € S, such that: 


e Every subset of S' has the form s() for a unique  € X. 
e s(0) =O and s(1) =S. 
s(x V y) = s(x) Us(y) and s(a A y) = s(x) 1 s(y). 


s(a’) = S \ s(x). 


An ideal in a Boolean lattice X is a subset J of X such that: 


eifajbelTthenaVbeT; 
eifaelandxe X thenaAce I. 
Note the analogy with ideals in a ring (with V and A taking the place of + and -). 
Let S be the set of all maximal ideals of X (those contained in no larger ideal), and 


for « € X, let s(x) denote the set of maximal ideals containing . 
Show that the properties listed above are indeed satisfied. 


Exercise 7.45 («) Prove that a distributive lattice is modular. 
Exercise 7.46 Prove that, in any lattice, if x < z, then 

eV (yAz)<(aVy)Az. 
[Hint: Show that, if ai < b; for i,7 = 1,2, then a1 V a2 < bi A b2.] 
Exercise 7.47 Complete the proof of Theorem 7.55(b). 


Exercise 7.48 True or false? 


(a) A partition lattice is modular. 
(b) A subalgebra lattice is modular. 


Exercise 7.49 Let S be a set, and F a field. Let V be the vector space of all functions 
from S to F’, with pointwise operations. For each partition 7 of S, let V, denote the 
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subspace 
{f ©V: f(x) = f(y) whenever z, y are in the same part of 7}. 
Prove that 


(a) The map 7 +> V,, is one-to-one; 
(b) Viyvrg = Vy A Virg- 


Is it true that the dual of the partition lattice of S is embeddable in the subspace lattice 
of V? 


Exercise 7.50 A lattice is complete if any subset has a least upper bound and a 
greatest lower bound. 


(a) Show that a finite lattice is complete. 

(b) Show that a lattice in which any subset has a greatest lower bound is complete. 

(c) Show that the subalgebra lattice of an algebra is complete. What about the 
congruence lattice? 

(d) Give an example of a lattice which is not complete. 


Exercise 7.51 Prove that a finite lattice is Boolean if and only if it is the direct 
product of copies of {0, 1}. 


Exercise 7.52 (a) Prove that the subspace lattice of a 2-dimensional vector space over 
F is a line with |F'| + 1 points. 
(b) Prove that the subspace lattice of a 3-dimensional vector space is a projective 
plane. 
(c) Show that the lattice shown in Figure 7.3 is the lattice of subspaces of GF(2)?. 


Exercise 7.53 Let L be an atomic lattice, with A the set of atoms. For any x € JL, let 
S(a) denote the set of atoms a € A satisfying a < x. 


(a) Prove that S(x)N S(y) = S(a@A y). 

(b) If L is Boolean, show that S(az’) = A\ S(x), and deduce that S(z V y) = 
S(a)US(y). Hence prove Theorem 7.54. [You have to show that a finite Boolean lattice 
is atomic.] 


Exercise 7.54 Let M3 be the three-point line and Ns the pentagon (Figure 7.4). Prove 
that M3 is not distributive and Ns is not modular. 


Exercise 7.55 (x) Show that, if a finite line is the subgroup lattice of a group, then 
the number of atoms is p+ 1, where p is prime. Which groups have such a subgroup 
lattice? 


Fig. 7.4 Two lattices 
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Exercise 7.56 Given a set M of morphisms with a partial composition and a subset 
I of identities, suppose that the following conditions are satisfied: 


e For any f,g,h € M, if fg and gh are defined, then (fg)h and f(gh) are 
defined and are equal. 

e For any f € M, there are unique identities 7 and 7 such that 7f and fj are 
defined; and if = fj = f. 

e For f,g € M, fg is defined if and only if there is an identity 7 such that f7 
and jg are defined. 

e For any identity 7, 7 is defined. 


Prove that M is the set of morphisms of a category. 


Exercise 7.57 Let X; and X2 be groups. Let P be a group and let 7 : P — X; 
(t¢ = 1,2) be homomorphisms. Suppose that, if Z is any group and ¢; : Z > X; 
(« = 1,2) are homomorphisms, then there is a homomorphism w : Z — P such that 
WT = di for 1 = 1,2. 

Prove that P is isomorphic to Xi x X2. 


Exercise 7.58 (*) We can turn the preceding exercise into a definition. Let X1 and X2 
be objects in a category. A direct product of X; and X2 is an object P and a pair 
a, : P — X; (i = 1,2) of morphisms such that, for any object Z and morphisms 4; : 
Z — Xi; (i = 1,2), there is a unique morphism # : Z — P such that wa; = ¢; for 
i=1,2. 

Show that any two direct products of X; and X2 are isomorphic. 

Give an example of two objects in a category which do not have a direct product. 


Exercise 7.59 («*) Give a similar definition of inverse limit in a category (see 
Exercise 7.29). 


Exercise 7.60 Let 6: G— H be a homomorphism of groups. Prove that G’@ < H’. 
Hence show that @ induces a unique homomorphism 6* : G/G’ > H/H"’. 

Hence show how to define a functor from groups to abelian groups which maps G 
to G/G’ for any group G. 


Exercise 7.61 Why is there no natural way to define a functor from groups to abelian 
groups which maps G to Z(G) (the centre of G)? 


Exercise 7.62 A preorder is a reflexive and transitive relation on a set. 


(a) Show that any preordered set (X,P) is a category, with object set X and 
morphism set P, with dom(a,y) = x and cod(z,y) = y for all (x,y) € P, and 1, = 
(a,x). 

(b) Show that a category is a preorder if and only if there is at most one morphism 
with any given domain and codomain. 


Exercise 7.63 Prove that GL,» is a functor. 


8 Applications 


From the surprisingly many applications of abstract algebra, I have chosen just 
two. One of these is the construction of Evariste Galois in the early nineteenth 
century; it explains why there is no formula for the solution of a polynomial 
equation of degree 5 or greater (comparable to the familiar formula z = (—b+ 
Vb? — 4ac)/2a for the solution of the quadratic equation ax? + br +c = 0). The 
other is a much more recent development, the theory of error-correcting codes, 
for transmitting information through noisy channels. 


Coding theory 


8.1 Codes. One of the most famous applications of coding theory occurred 
in the exploration of the outer planets of the solar system by unmanned space 
probes. These probes carried scientific equipment and cameras. The informa- 
tion about temperatures, magnetic fields, and so on was very important to 
astronomers, but the pictures of Jupiter, Saturn, and their moons made us all 
aware of the existence of other worlds with their distinctive characters. 

Typically, one of these probes had a generator capable of producing a few 
hundred watts of electric power, of which only a few tens of watts was available to 
the transmitter responsible for sending the information back to earth. This weak 
signal had to be separated from the radio ‘noise’ produced by the universe, and 
the useful information filtered from it. Naturally, sometimes the signal received 
was incorrect as a result of this interference. The job of error correction is to 
ensure that the correct information is received. 

The procedure can be divided into a number of stages. 


Stage 1: Generation of messages For simplicity, all information is sent as a 
sequence of ‘words’ or blocks of zeros and ones. A picture is divided into a large 
grid of small squares or ‘pixels’. Each pixel is then scanned and the intensity of 
each of the three primary colours measured. This measurement is then digitised 
to be an integer in the range [0, 255] say, and the resulting integer converted into 
8-bit binary form. In this way, a picture becomes a very long string of zeros and 
ones, which can be chopped up into blocks of fixed length. 


Stage 2: Encoding Each block is then translated into a longer block called a 
‘codeword’. This slows down the transmission time, since more bits have to be 
sent; the redundancy is used for error correction. The encoding is devised so that 
any two codewords look very different. Then even if a few bits are changed during 
transmission, what is received should look more like the transmitted codeword 
than any other, and decoding is possible. 
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Stage 3: Transmission The codewords are sent by the transmitter, and 
received by Earth-based equipment (possibly with some errors). 


Stage 4: Decoding As explained, the receiver has the list of all possible 
codewords, and can compare the received word to them to find which is most 
similar. 


Stage 5: Recovery of the information From the codeword, we can translate 
back into the binary string, and interpret it as a picture (or as scientific data, as 
appropriate). 


We now develop the mathematical language to describe this. 


Definition Let F be a set of symbols, called the alphabet (with |F| = 
q > 1), and let n be a positive integer. A word of length n over F is an n- 
tuple of symbols from F’. It is common to write a word as a,a9---d@,p instead 
of (a1, G2,..-,@n). 

A code C of length n is a subset of the set F” of all words of length n, 
subject to the condition that |C| > 1. Its elements are called codewords. 


Remarks The reason for requiring that |C| > 1 is that, if only one message 
could be sent, no information could ever be conveyed (except the information 
that the transmitter is operating). In the space probe, the alphabet F’ is the 
binary alphabet {0, 1}. 


Definition Let v,w be words of length n. The Hamming distance d(v, w) 
from v to w is the number of coordinates where v and w differ: 


d(v,w) =|{i: 1<i<n,vu; 4 w;}I. 


Remark The motivation here is that we regard a single ‘error’ in transmission 
as changing one letter in a word. So the Hamming distance d(v, w) is the number 
of errors which would be required to change the transmitted word v into the 
received word w. We suppose that our system has the property that it is unlikely 
that a large number of errors will occur; so, with high probability, the Hamming 
distance between transmitted and received words is not too great. 


Proposition 8.1 (a) For any words v,w, we have d(v,w) > 0, and 
d(v,w) = 0 if and only ifv = w. 
(b) For any words v,w, we have d(w,v) = d(v,w). 
(c) (The triangle inequality.) For any words u,v,w, we have d(u,v) + 
d(v,w) > d(u, w). 


Proof (a) and (b) are trivial. 

For (c), note that d(u,v) errors will turn u into v, and a further d(v,w) 
errors turn v into w. But some coordinates may have been changed twice, so the 
distance from u to w may be smaller than the sum of d(u,v) and d(v, w). 


Applications 301 


Remark In topology, a metric space is defined to be a set M with a ‘distance 
function’ or metric d from M x M to the non-negative real numbers satisfying 
conditions (a), (b), and (c) of the proposition. So we have shown that the set of 
words of length n, equipped with Hamming distance, is a metric space. 


Definition Let e be a positive integer. The code C of length n is said to be 
e-error-correcting if the following holds: for any word w of length n, there is 
at most one codeword c € C' which satisfies d(w,c) < e. 


The reason for the name is as follows: Suppose that C’ is e-error-correcting. 
Suppose that we use C in a communication system in which we know, or can 
be fairly certain, that not more than e errors will occur during the transmission 
of a single word. Then these errors will be corrected. For suppose that c is 
transmitted and w received. Then by our assumption, d(c,w) < e. Since C is 
e-error-correcting, any other codeword c’ satisfies d(c’, w) > e. So c is the nearest 
codeword to the received word, and the decoding is correct. 


Definition The minimum distance of a code C is the least distance between 
two distinct codewords in C. 


Theorem 8.2 The code C is e-error-correcting if and only if its minimum 
distance is 2e +1 or greater. 


Proof Suppose that C' is not e-error-correcting, so that there exist a word 
w and two different codewords c, and cg both at distance e or less from w: 
that is, d(c1,w) < e and d(co,w) < e. By the properties of Hamming distance, 
d(w,c2) < e, and so d(ci,c2) < e+e = 2e. Hence it is not true that C has 
minimum distance 2e + 1 or more. 

Conversely, suppose that C' has minimum distance d < 2e. Choose f to be the 
integer part of d/2. Then f < e and d— f < e. There are two codewords c; and 
c2 with d(c,, c2) = d. Thus we can get from c; to cy by changing d coordinates. 
Let these be changed one at a time, and let w be the word obtained when f 
coordinates have been changed. Then d(ci,w) = f < e, and d(c2,w) =d—f <e. 
So C is not e-error-correcting. 


Thus, we want a good code to have large minimum distance (so that it will 
correct as many errors as possible). We also want it to have as many codewords 
as possible: the more codewords, the faster information can be sent. 

To measure this, we define the rate of a code C of length n over an alphabet of 
q symbols to be log, |C|/n. The motivation is that, if |C| = q®, then q* messages 
can be encoded. Without the encoding, each message could be sent as a k-tuple; 
after coding, it becomes an n-tuple, and so transmission is k/n times as fast as 
without the encoding. 
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The tension between error correction and rate is expressed in various inequal- 
ities connecting the minimum distance and size of a code. Here are two of the 
simplest: 


Theorem 8.3 Let C be a code of length n over an alphabet of q symbols, having 
minimum distance d. 


(a) (Hamming bound): If d > 2e+1 (that is, if C is e-error-correcting), then 


e s 
< n 2) ee 
ese /X ()a-v 
(b) (Singleton bound): |C| < qr’~4*!. 


Proof (a) Let c be a codeword. We count the number of words w such that 
d(c,w) < e. How many satisfy d(c,w) = 7? We have to make i errors, which we 
can do by choosing 7 coordinates to change (in (") ways), and then changing the 
entry in each of these coordinates to a different symbol in the alphabet (q — 1 
choices for each coordinate, so (q— 1)‘ altogether). Multiplying these numbers 
and summing over i, the number of words w which satisfy d(c, w) < e is 


ay ca) 
S> ( ‘) (q— 1). 
=o 4" 

We can regard these words as forming a ‘ball’ of radius e having the codeword c 
as its centre. 

If we do this for all codewords, there is no overlap among the words we find. 
For, by assumption, C is e-error-correcting; so no word is at distance e or less 
from more than one codeword. Geometrically, the balls are packed into the space 
without overlap. So the number of words accounted for is 


iol ("\ia-ay. 


i=0 


But this cannot exceed the total number q” of words. 


(b) Look at the codewords through a window which shows only the first 
n—d+1 coordinates. The pieces we see are all different. For if the first n -d+1 
coordinates of c; and co are the same, then these codewords cannot differ in 
more than the last n — (n —d+1) = d—1 coordinates; so d(c1,c2) < d—1, 
contrary to assumption. 

So the number of codewords does not exceed the total number qg”~¢+! of 
things that can be seen through the window. 


Remark Codes which attain these bounds are of particular importance. Such 
codes are called perfect for the Hamming bound, or maximum-distance 
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separable (MDS for short) for the Singleton bound. We will see interesting 
examples shortly, but here is a simple example. The repetition code of length 
n consists of all codewords aa---a (for a € F). The idea is that, to get the 
message through, repeat it lots of times! This code has g codewords (one for 
each symbol), and the minimum distance is equal to the length n. The Singleton 
bound for d = n asserts that |C| < g”~"*! = q, so this bound is met. In the case 
q = 2, if n is odd, say n = 2e + 1, the Hamming bound is also attained. 


There is more to coding theory than just resolving this tension between large 
minimum distance and many codewords. The codes should not be too difficult to 
implement. (Recall the space probe: the encoding must be done by a simple low- 
powered machine.) It turns out that concepts from algebra are crucial for this. 


8.2 Linear codes. Let us formalise the encoding and decoding processes. 
Let S be a set of messages, and C' a code of length n over an alphabet F 
with q symbols. Then encoding is a function € : S — C, which is one-to-one 
(since if two messages were assigned to the same codeword, the receiver could 
never decide which had been intended). In fact, we normally assume that the 
encoding function is a bijection, since codewords which are never used could be 
removed from the code. In this case, there is an inverse map e~! which translates 
codewords back to messages. 

Decoding is a function 6 : F” — C. No formal restrictions are made; but often 
we assume that it is nearest-neighbour decoding. This means that 6(w) is 
always a codeword which is as near as possible to w (one which minimises d(c, w) 
over all c € C). If there is more than one codeword at the smallest distance from 
w, then 6(w) should be one such, but we do not specify which one. We could 
if required follow 6 with the inverse of € to get a map from F to the set S of 
messages. 

Usually, F' is the so-called binary alphabet {0,1}, which can be regarded 
as the field Zz. We will be more general, and assume only that F is a finite field. 
Then the set F” is an n-dimensional vector space over F’. We will see that there 
are good reasons for assuming that C’ is a subspace of F””. If it is a k-dimensional 
subspace, then |C| = q”. So there are also g* messages in S$, and we may assume 
that they are all the k-tuples of elements of F’. Then it is natural to make the 
encoding map a linear transformation! 


Definition Let the alphabet F be a finite field. The code C is a linear code 
if it is a subspace of the F-vector space F”. 


There are other advantages too. 


Definition Suppose that the alphabet is a finite field. The weight wt(w) of a 
word w is the number of non-zero coordinates of w. The minimum weight of 
a code is the smallest non-zero weight of any codeword. 


Proposition 8.4 The minimum weight and minimum distance of a linear code 
are equal. 
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Proof First we show that d(v,w) = wt(v — w). This holds because the ith 
coordinates of v and w are unequal if and only if the ith coordinate of v — w is 
non-zero. 

Now let C have minimum distance d and minimum weight f. Let d(c,, cz) = d. 
Then wt(c; —c2) = d, and cy —c2 € C by linearity; sod > f (as f is the minimum 
weight). Conversely, let wt(c) = f. Then, by definition, d(c,0) = f, where 0 is the 
all-zero word; and 0 € C by linearity, so f > d (as d is the minimum distance). 
We conclude that d= f, as required. 


Thus, instead of comparing all pairs of codewords to find the minimum dis- 
tance, in a linear code it is only necessary to look at all codewords to find 
the minimum weight. 

Also, if c is transmitted and w received, then w = c+ 2, where the weight of 
x is equal to the number of errors which occurred. 

How do we describe a linear code C? 

Since C' is nothing but a subspace of F'”, we choose a basis for it, a set of k 
words of length n. We can arrange these vectors as the rows of a k x n matrix 
G, called a generator matrix for C’. Thus, formally, a generator matrix for C 
is a matrix whose rows form a basis for C; and C is the row space of G. 

The reason for the term ‘generator matrix’ is as follows: Any codeword can 
be written uniquely as 719; +---+ Xxgg~, where g1,...,g9, are the rows of C. 
More briefly, this is rG, where « = 2, --- x, € F*. Thus, if the set S of messages 
to be sent is the set F* of all words of length k, then the encoding map € is just 
the linear map «+> «G from F* to F”: it is one-to-one, and its image is C. 

In the binary case, the matrix multiplication involved in computing the encod- 
ing map can be performed by a very simple circuit, which can be built into a 
space probe. 

If we apply elementary row operations to G, we do not change its row space, 
and hence the result is still a generator matrix for C. (We do, however, change 
the encoding map.) By the results of Section 4.10, we can assume that G is 
in reduced echelon form. Now, given a matrix in reduced echelon form, we can 
apply column permutations to bring the columns containing the leading 1s to 
the front, obtaining a matrix of the form (J A) in block form. Of course, a 
column permutation does change the code: it has the effect of applying the 
same permutation to all codewords. However, this does not change weights or 
Hamming distances, which are the important things as far as coding theory goes; 
the new code is as good as the old one. So we may assume, if necessary, that the 
generator matrix is in the standard form G = (J A). 

If G is in standard form, then the encoding map is given by 


tre uG = (a xA). 
We see that the first k symbols of the codeword are precisely the message being 


sent. This makes the recovery of the message from the codeword after decoding 
particularly simple. These k symbols are called the information digits, and 
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the remaining n — k symbols the check digits. Encoding consists of taking the 
message and adding to it some new digits for error correction. 


It is time for an example. 


Example 1 Over the binary field Zag, let 


ooo KF 
oor © 
oro oOo 
> ee ee a) 
RPrRrF © 
PROF 
rFPOrRrFR 


Then G is the generator matrix, in standard form, of a binary linear code C' with 
length 7 and dimension 4. By listing all 24 = 16 codewords, it is easily checked 
that the minimum weight is 3, so that C' is l-error-correcting. Encoding takes a 
message %12%2%3%4 to a codeword x1 --- x7, where 


tT = %2 17 13 1 U4, 


t= 17 3 7 La, 


27 = AM TW ILy4~ 
This code attains the Hamming bound of Theorem 8.3(a), since 
|C| =16 = 27 /(1+ 7(2—1)). 


This means that the balls of radius 1 with centres at the codewords cover the 
whole of F’, so that any word whatever is at distance 0 or 1 from a unique 
codeword. 

Hence, to decode, we could take the received word, look through the list of 
16 codewords to find the one which is equal to it or differs in one place only, 
decode to this codeword, and take the first four digits as the message. 


In the next section, we give a more efficient decoding method. 


8.3. Syndrome decoding. We begin by giving another description of a code. 
The motivation comes from either linear algebra or coding theory. 

In terms of linear algebra, we described a subspace of F'” as the row space of 
a matrix, in other words, the image of a linear transformation. It would be just 
as natural to use the kernel instead. 

In coding theory terms, the motivation is even more convincing. The received 
word has the form ‘codeword plus error’; we want to remove the error and reveal 
the codeword. However, the error is unknown, and we know that the codeword 
is chosen from a known subspace. So, instead, we remove the codeword to reveal 
the error, and then find the codeword by subtracting the error from the received 
word. Accordingly, we want a function f such that f maps every codeword to 
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zero but f is one-to-one on possible error patterns. Of course, f should also be 
linear, so that 


f (codeword + error) = f(codeword) + f(error) = f(error), 
from which the error can be determined. 


Definition Let C be a linear code with length n and dimension k. A check 
matrix for C is a (n — k) x n matrix H with the property that, for any word 
w € F”, we have wH' =0 if and only if w € C. 


Note that we take H, like G, to have n columns, so that we have to transpose 
it in the equation. 
The word wH'' is called the syndrome of w. 


Proposition 8.5 Let H be a check matrix for a linear e-error-correcting code. 
Then, if w, and we are any two words with weight at most e which have the 
same syndrome, then w, = We. 


Proof If w;H!' = w2H"', then (w; — w2)H! = 0, and so w; — w2 € C. But 
w1 — We has weight at most 2e, whereas C’ has minimum (non-zero) weight at 
least 2e + 1. So, necessarily, w1 — w2 = 0. 


Thus our condition that different error patterns have different images is 
satisfied, at least for the errors that we can correct. 

The decoding now works as follows: Given the received word, calculate its 
syndrome, work out the error pattern which would produce that syndrome (for 
example, look it up in a table), and subtract that error pattern from the received 
word to give the codeword. 

In our example, it can be checked that the matrix 


0001 1 
H={0 110 0 
101 0 1 


OrRF 


1 
1 
1 


is a check matrix for the code of Example 1 earlier. Since the code is 1-error- 
correcting, the relevant error patterns are 0 and the word e; with 1 in the ith 
position and 0 elsewere, for i = 1,...,7. The syndrome of 0 is 0; the syn- 
drome of e; is the ith row of H', which by inspection happens to be the base 2 
representation of the integer 7. So decoding is particularly simple: 

Given a received word w, calculate its syndrome wH'. If it is zero, then no 
error occurred; otherwise, if the syndrome is the base 2 representation of 7, then 
the 7th digit is incorrect. 

Suppose, in our example, that we wish to send the message 1101. This is 
encoded as 1101001. Suppose that an error occurs in the second position, giving 
the received word 1001001. Multiplying by H' gives the syndrome 010, which is 
the second row of H', or the base 2 representation of 2. So we correct the word 
to 1101001, and extract the information 1101 correctly. 
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If, on the other hand, two errors occurred, say in positions 2 and 5, the 
received word would be 1001101, with syndrome 111; we would decode by chang- 
ing the seventh digit, giving 1001100, and extract the wrong message 1001. Our 
use of this code depends on the assumption that it is very unlikely that two or 
more errors will occur during the transmission of a single word. 

It is possible to look at syndrome decoding in another way. If an unknown 
codeword is sent, and an error x occurs, then the received word belongs to the 
coset C + x2. Now C is the kernel of the linear transformation from F” to F"—* 
represented by H', so each such coset maps to a single word in F”~*. We can 
correct any assumed set of error patterns which are mapped one-to-one by H'; 
that is, at most one from each coset. It is natural to choose the word of smallest 
weight in a coset, as the most likely error pattern to occur. Such a word is called 
a coset leader. (If two or more words have the minimum weight in a coset, 
choose among them arbitrarily.) Then any syndrome specfies a unique coset, 
and we can decode using a table of syndromes and corresponding coset leaders. 

Syndrome decoding can be used for any linear code, although in particular 
cases it may not necessarily be the most efficient way. 


The minimum weight of a linear code can be found from its check matrix as 
follows. 


Proposition 8.6 Let C' be a linear code with check matrix H. Then C' has 
minimum weight 6 or greater if and only if any 6 —1 columns of H are linearly 
independent. 


Proof Let hj,...,hn be the columns of H. Then c,---c, is a codeword if 
and only if cyhy +--+: + e€phn = 0. So codewords of weight f correspond to 
dependence relations among sets of f columns; and the minimum weight is equal 
to the minimum size of a set of linearly dependent columns. 


Using the check matrix, it is easy to construct an important family of codes, 
the Hamming codes. Let r be given, let F = GF(q), and let (F")' be the 
r-dimensional vector space of all columns of length r (that is, the r x 1 matrices). 
Then |X| = (q’ — 1). Let X = (F")' \ {0}. Call two elements of X equivalent 
if one is a scalar multiple of the other. Each equivalence class contains q — 1 
elements (since there are g—1 non-zero scalars); so there are n = (q” —1)/(q-1) 
equivalence classes. Let Y be a set of representatives of the equivalence classes. 
Thus Y consists of one non-zero vector from each 1-dimensional subspace of 
(F*)'. Let H be the r x n matrix whose columns are the elements of Y. Then 
we define the g-ary Hamming code of length n to be the linear code with check 
matrix H. 


Proposition 8.7 Hamming codes are perfect 1-error-correcting (that is, they 
attain the Hamming bound). 


Proof By construction, no column is zero, and no column is a multiple of 
another. So any two columns are linearly independent. By Proposition 8.6, the 
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code has minimum weight at least 3, and so it is 1-error-correcting. Since 


IC] =q" "=q"/A1+n(q-1)), 


the Hamming bound of Theorem 8.3(a) is attained. 


We can make the choice of columns definite by taking from each equivalence 
class the unique vector whose first non-zero element is 1. For example, the ternary 
(over GF(3)) Hamming code of length (3° — 1)/(3 — 1) = 13 has check matrix 


0000111 111i1d!i21 
0111000121 1:22 «2 
1012 01 2 01 2 0 1 2 


One problem remains: How do we find a check matrix for a code? 


Theorem 8.8 (a) Let G and H be matrices of size k x n and (n—k) x n 
respectively over a finite field F, both having linearly independent rows. Then 
G and H are the generator and check matrices for the same code if and only if 
GH" =0, 

(b) Let G = (I A) be a generator matrix of a code C, in standard form. Then 
a check matrix of the same code is H =(—A! I). 


Proof (a) By assumption, the row space C of G and the null space C’ of H 
both have dimension k. Now GH! = 0 is equivalent to the assertion that every 
row of G lies in C’; that is, that C C C’. So the result follows. 

(b) The identity blocks in G and H ensure that their rows are linearly 
independent; and GH! = —IA+ AI = 0. 


This gives a simple way to compute H, if G is in standard form. 

In general, apply elementary row operations to G to bring it to reduced 
echelon form. If it is in standard form, proceed as before. Otherwise, apply a 
permutation 7 to its columns to bring it to standard form; then construct the 
HT for this standard form of G, and apply the inverse of 7 to its columns. 


8.4 Cyclic codes. Cyclic codes form a subclass of the class of linear codes. 
For these codes, an even more precise algebraic description is possible, leading 
to improved decoding algorithms. 


Definition Let C be a linear code of length over a field F. Then C is a cyclic 
code if, for every word w = a ,a9°:-:Gn, € C, the cyclic shift a,a,a---ay,_1 is 
also in C. 


We translate into algebra in the following way. It is convenient to change 
notation, and number the coordinates from 0 to n — 1, instead of from 1 to 
n. Now, with each word w = aoa 1--:@n_-1 € F”, we associate the polynomial 
w(@) = a9 + aye +++++4Gn—-12""1 € Fla]. 
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Let I be the ideal of Fix] generated by «” — 1, and let R be the factor 
ring F[a]/I. Now each coset of J in R has a unique representative which is a 
polynomial of degree at most n — 1 (or zero). For, given any polynomial f(z), 
we can use the division algorithm to write 


f(x) = (" — Iq(a) + r(2), 


where r = 0 or deg(r) <n; and f(x) and r(a) lie in the same coset of I. 

So there is a natural bijection between R = F[2x]/I and the set of polynomials 
of degree at most n — 1 (together with 0), and hence with the set F'” of words 
of length n. We will switch freely between these sets. 


Proposition 8.9 A code C of length n is cyclic if and only if the corresponding 
elements of R form an ideal. 


Proof Multiplication by x corresponds to the cyclic shift. For consider a word 
W = aga, +++Gn_ 1. The corresponding polynomial is ag + ay2 +-+:+ay,_1,2"7!. 
Multiplying by x gives agv + a,x7 +---+ayn_12". Now x” and 1 lie in the same 
coset of J = («” — 1), so are equal in the factor ring. Thus, in R, the above 
‘polynomial’ is equal to a,_1 + apv +a, 27 +---, which corresponds to the word 
An—149a1°--+, the cyclic shift of w. 

Thus, if C is an ideal, it is closed under addition, and under multiplication 
by any scalar (hence it is a linear code), and under multiplication by x, in other 
words cyclic shift (and hence it is a cyclic code). Conversely, suppose that C' is 
a cyclic code. Then it is closed under addition, and under multiplication by any 
scalar or by x. Combining these two operations, we can build any polynomial, 
so C' is closed under multiplying by any polynomial, and so is an ideal. 


The problem now is to describe the ideals in the ring R. First, we observe 
that they are all principal. 


Proposition 8.10 Let R be a commutative ring with identity in which every 
ideal is generated by a single element. Then the same properties hold for any 
factor ring of R. 


Proof Consider a factor ring R/I. By the Second Isomorphism Theorem, its 
ideals are all of the form J/I, where J is an ideal of R containing I. Now, if 
J =(r), then J/I = (I+r). 


We cannot assume that the factor ring R/J is a principal ideal domain, how- 
ever, even if R is; for it may not be an integral domain. For example, a factor 
ring of Z has the form Z,, for some m; all of its ideals are principal (by the 
proposition), but it is an integral domain only if m is prime (in which case it is 
a field). 


Proposition 8.11 Any ideal of F[x|/(a” — 1) is generated by (the coset con- 
taining) a monic polynomial g(x) which divides x” —1. There is a unique such 
polynomial for any ideal. 


310 Applications 


Remark The polynomial g(x) is called the generator polynomial of the 
cyclic code corresponding to the ideal (g()). 


Proof Suppose that the ideal I is generated by f(x). Let g(x) be the g.c.d. of 
f(a) and #” — 1 (in F[z]), chosen to be monic. Then g divides f, so (g) contains 
f. But also, by the Euclidean Algorithm, g(a) = a(x) f(x) + b(x)(a” — 1). In the 
factor ring R, this equation says g(x) = a(x) f(x); so also f divides g, and (f) 
contains (g). Thus, g generates the ideal J. It is by definition a monic polynomial 
dividing x” — 1. 

The uniqueness follows from the Second Isomorphism Theorem. By assuming 
that our polynomial divides «” —1, we see that the ideal of F[z] that it generates 
contains («”—1). So, if g, and gp were two such polynomials, they would generate 
the same ideal of F'[a]. Hence they would be associates in F'[x]; that is, they would 
differ only by a scalar factor. Since both are monic, they would be equal. 


So to construct all cyclic codes of length n, we must factorise x” — 1 into 
irreducibles in F'[x], then list all divisors of 7” —1 (the products of some of these 
irreducibles), and for each divisor, form the corresponding ideal of R. 


Theorem 8.12 Suppose that g(x) is the generator polynomial of the cyclic code 
C’. Let 


g(x) = Coe + tee +---+a9, 
with dn—~p = 1, and let g(x)h(x) = x” — 1, where 


h(x) = bya* + by_ya*-1 + --. + bg, 


with by = 1. Then a generator matriz G and a check matrix H for C are 
given by 
a a1... Gn—k 0 0 
0 a a An—k 0 
G — ) 
0 0 ao ay aAn—k 
by be-1 eee ~~ 0 0 
0 bp Opa bo 0 
AH a, 
0 ec 0 by bey we. 


Remark Both G and H are in echelon form. So the theorem implies that 
dim(C’) = k = n — deg(g(z)). 


k-1 


Proof The rows of G correspond to the polynomials g(x), xg(x),...,2"~* g(x), 


so they all belong to C. 


Applications 311 


Take any word x € C, corresponding to a polynomial f(x) g(x) (mod «” —1). 
Write f(x) = h(x)q(a) + r(x), where r = 0 or deg(r) < k. Then f(x)g(x) = 
(a” — 1)q(x) + r(x)g(x), and this is congruent mod x” — 1 to r(a)g(x), which 
is a linear combination of the polynomials x'g(x) for i < k. Hence w is a linear 
combination of the rows of G. So the row space of G is C, as claimed. 

The (i, 7) entry of G is a;_;, with the convention that a; = 0 if / is outside 
the range [0,n — k]. With a similar convention, the (7,7) entry of H is by_—j+:. 
Hence the (i,j) entry of GH' is 


y a—ibe-i45 = > Ambe—i+j—m: 
l m 


This is the (k — i+ 7)th coefficient of the product gh. Now 1 <i < k and 
1<j<n-k,so 


k-k+1l=1<k-it+tj<k—-14+(n—h)=n-1. 


But g(x)h(x) = x” — 1, so all the relevant coefficients are zero. Thus GH! = 0, 
and it follows from Theorem 8.8 that H is a check matrix for C. 


Example We consider binary cyclic codes of length 7. We have 
ge —1=(¢-1)(22 +241)? +2? +1). 


Thus there are 2° = 8 cyclic codes, corresponding to the divisors of 2” — 1 as 
follows: 

g(x) = 1. This code is generated by 1000000 and its cyclic shifts, and so it is 
the whole of FS. 

g(x) = x—1. The code is generated by the word 1100000 and its cyclic shifts, 
and consists of all words of even weight. The dimension is 6 and the minimum 
weight is 2. 

g(x) = 23 + 2+ 1. This code consists of the zero word, 1101000 and its cyclic 
shifts, and every word obtained by interchanging zeros and ones. These 16 
words form a code with dimension 4 and minimum weight 3 (and hence is 
1-error-correcting). This is the code that we met in Example 1 earlier. 

g(x) = x? +a7+1. This code is obtained from the previous one by reversing all 
the codewords; so it is ‘equivalent’ (it has the same dimension and minimum 
weight). 

g(x) = (w—1)(a* +x +1). This code consists of the eight words of even weight 
in the earlier example (with g(x) = 2° + 2+ 1). It has minimum weight 4. 

g(x) = (x — 1)(@° + 2? +1). This is the reverse of the preceding code. 

g(x) = 2° +--»+2+4+1. This is the repetition code spanned by the all-1 vector. 

g(x) = 2" +1. This is not a code at all, since it contains only the zero vector: 
we require a code to have at least two codewords! 


8.5 BCH codes. In the last two sections, we have seen ways to construct 
codes for which the length and dimension are easy to calculate. Finding the min- 
imum distance (or minimum weight), however, is much harder. In this section we 
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examine a construction which enables one to specify a length n and a minimum 
distance d, and find a code of length n and minimum distance at least d. More- 
over, we can also give a lower bound for the dimension of the constructed code. 
The construction was given independently by Bose and Ray-Chaudhuri and by 
Hocquenghem; so the codes should be called BRH codes, but the term ‘BCH 
codes’ has become standard. 

The BCH codes depend on properties of finite fields. We now review these 
properties. 


Definition Let n and q be coprime integers. The order of g mod n is the 
smallest positive integer e such that g° =, 1. (Note that the condition (n,q) = 1 
implies that q is a unit mod n; the order of gq mod n is just the order of g as an 
element of the group of units of Z,,.) 


Definition <A primitive nth root of unity in a field F’ is an element a whose 
order in the multiplicative group of F is precisely n (that is, a” = 1 but a” 4 1 
for 0<m<_7n). 


Proposition 8.13 Let q be a prime power, and let e be the order of q mod n. 
Then the smallest field which contains GF(q) and a primitive nth root of unity 
is GF(q°). 


Proof Any field containing GF(q) has the form GF(q™) for some m. If GF(q”) 
contains a primitive nth root of unity, then its multiplicative group contains a 
subgroup of order n, and so n divides q™ —1. Conversely, if n divides q’” —1, then 
the multiplicative group of GF(q™) (being cyclic) contains a cyclic subgroup of 
order n, so that GF(q™) contains a primitive nth root of unity. 


We also need the basic property of Vandermonde determinants (see 
Section 4.11): If a1,...,a,, are distinct elements of a field F’, and A is the n x n 
matrix with (i,j) entry a/~' for 1 < i,j <n, then det(A) 4 0. 

The codes that we construct will be cyclic codes of length n over GF(q), 
where we assume that n and gq are coprime. We are also given a positive integer 
6. We define the BCH code of length n and designed distance 6. We will 
prove that the actual minimum distance is at least 6. 

Let e be the order of g mod n, and let a be a primitive nth root of unity in 
GF(q°). We take a representation of GF(q°) by e-tuples of elements of GF(q), in 
the standard way: if a generates GF(q°) over GF(q) and satisfies the polynomial 
f(x) = 0, where deg(f) = e, then every element of GF(q°) can be written 
uniquely as cy + cya +--+ ce_1a°~!, and can be represented by the e-tuple 
CoC1 ‘++ Ce—1. For technical reasons, we use here the transpose of this e-tuple: that 
is, we represent e by a e x 1 matrix. (The actual element a used is unimportant, 
but we may choose a = a.) 
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Now the BCH code of length n and designed distance 6 over GF(q) 
is the code with check matrix 


1 a a? ae qr 
fea a2 at ae g2(n-1) 
1 ad gtéD gy -D-1) 


Each matrix element belongs to the field GF(q*), and so is represented as an 
e x 1 matrix over GF(q). Thus, H is a e(d — 1) x n matrix over GF(q). 


Theorem 8.14 The BCH code of length n and designed distance 6 over GF(q) 
has minimum distance at least 6, and has dimension at least n — e(6 — 1). 


Proof To show that the minimum distance of a code is at least 6, it is necessary 
and sufficient that any 6 — 1 columns of a check matrix for the code are linearly 
independent (see Proposition 8.6). Consider the determinant of the matrix (over 


GF(q°)) formed by columns m,,mz2,...,M5_1. This is 
q™ q™s-1 
det Seg sy ba 
qm(6-1) gms-1 (6-1) 


The ith column has a common factor a”™ #4 0. Taking out these factors, we 
obtain a Vandermonde determinant V(a™!,...,a’"*-!), which is also non-zero. 

So the chosen columns are linearly independent over GF(q°), and so certainly 
over the smaller field GF(q). 


The dimension of the code is n minus the rank of the check matrix. This 
rank is not greater than the number e(d — 1) of rows. (The rows may not be all 
independent.) 


It is not obvious from the definition, but the following is true: 
Proposition 8.15 BCH codes are cyclic. 


Proof Any word v = coc,:--Cp—1 corresponds, as in the last section, to a 
polynomial f(x) = co + cya +--+ ¢,_-12"~+. The conditions that the word lies 
in the BCH code can be written as 


f(a’) = co tera’ +--+ ep-1a') = 0, 


for 7 = 1,2,...,6 —1. Let g(x) be the least common multiple of the minimal 
polynomials of a,a?,...,a°~! over GF(q). Then the word corresponding to f(z) 
lies in the BCH code if and only if f(a) is divisible by g(x). Moreover, the roots 
of g(x) are nth roots of unity, so g(x) divides x” — 1. So the BCH code is the 
cyclic code with generator polynomial g(x). 
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In many cases, we can use this observation to do better than the previous 
lower bound for the dimension. This depends on the following fact: 


Proposition 8.16 Let f(x) € GF(q)|[z], and suppose that a is a root of f in 
some extension field. Then a4 is also a root of f. 


Proof Let g=p”, where p is prime. Then the map z+ 27 is the nth power of 
the Frobenius map, and hence is an automorphism of the field kK = GF(q)(q). 
Let f(a) = ana" + -+-+ a9, so that 


f(a) =a,a” +--+ a) =0. 
We apply the automorphism x +> x? to this equation, noting that a? = a; for all 


i (since the coefficients a; belong to GF(gq), and so are roots of the polynomial 
xi — x). So 


f(a’) = anal” +-+-+ a9 =0. 


Now let v = coc, +++ Cn_1 be a word of length n. By the preceding result, if we 
can find two values 7 and 7 such that 7 =, iq’ for some m, then the ith and jth 
conditions above are equivalent, and we can strike out the jth row of H. This 
does not affect the code, but means that the check matrix has fewer columns, so 
we obtain a larger lower bound for its dimension. 


Example Consider the binary BCH code C with length 15 and designed 
distance 5. The codewords correspond to the polynomials having roots 
a,a?,a>,a*, where a is a primitive 15th root of unity (in GF(2*): note that 
the order of 2 mod 15 is 4.) This code has minimum weight 5 (and so is 2- 
error correcting). The lower bound we gave earlier for its dimension is dim(C) > 
15 —4-4 = —1: this is of course useless! But, by the previous result, a? and 
a* are unnecessary: it is enough to assume that a and a® are roots. This gives 
dim(C) > 15 —4-2=7, so |C| > 27 = 128. 

If we take a to be a root of the polynomial 24 + 2 +1 over GF(2), then 
a? (which is a 5th root of unity) is a root of 2+ + 234+ 2?+a+1. So the 
generator matrix of the code is the product of these two polynomials. Since they 
are irreducible, we see that the dimension of C is exactly 15 — 8 = 7. 

The Hamming bound for a 2-error-correcting code of length 15 gives |C| < 
215/ (1+ 15+ (7?)) = 270.81...; since |C| is a linear code, the number of code- 
words is a power of 2, so in fact |C| < 256. So the dimension of C is within one 
of best possible. 


One important special case of BCH codes is that when n = q — 1. These 
codes were discovered earlier, and are called Reed—Solomon codes. In this 
case, the order of g mod n is clearly equal to 1. So the BCH bound for a code 
of designed distance 6 gives dim(C’) > n — 6+ 1. On the other hand, if the true 
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minimum distance is d, then d > 6; and the Singleton bound (Theorem 8.3(b)) 
gives |C| < g’~4+!, whence dim(C) < n— d+ 1. Summarising, we have 


n—-d+1<n—641<dim(C) <n-—d+1, 
so equality must hold throughout. Thus, we have 


Proposition 8.17 A Reed-Solomon code (that is, a BCH code of length 
n= q-—1 over GF(q)) with designed distance 6 has minimum distance 6 and 
dimension n—6+1. Hence it is maximum distance separable (that is, it attains 
the Singleton bound). 


Exercise 8.1 Consider the code over the alphabet {1,2,3} whose words are 
112233, 223311, 331122, 123123, 231231, 312312. Find the minimum distance of this 
code. 


Exercise 8.2 A channel transmits binary digits in blocks of length 8. Because of 
synchronisation problems, errors are more likely in the first four bits than in the second 
four; the probability of incorrect transmission of a bit is Tr for the first four bits and 
iio for the others. We use the following scheme to encode 4 bits of information for 
transmission through the channel. The input aia2a3a4 is encoded as b1b2b3babsb6b7bs, 
where 


(i) the first bit is repeated four times, that is, 
by = bo = b3 = b4 = 1; 


(ii) the next three bits are sent without change, that is, 


bs = a2, be = a3, b7 = aa; 


(iii) the last bit is a ‘parity check’ for the three preceding, i.e. an even number of 
bs, be, b7, bg are equal to 1. 


For example, 1010 is encoded as 11110101.] Decoding is done as follows: Suppose 
that c1c2c3C4C5CgC7Cg is received. 


a) If all or all but one of c1,c2,c3,ca agree, we assume that their common value 
is ai. If two of them are 0 and two are 1, we declare a decoding failure (‘an error has 
occurred but we cannot correct it’). 

b) If the last four bits have even parity, that is an even number of them are 1, we 
assume correct transmission, and set az = C5,a3 = C6, a4 = C7. Otherwise, we declare a 
decoding failure. 


For example, 11101111 is decoded as 1111, while 11101101 gives a decoding 
failure.] 


Problems (a) Let C' be the code, that is the set of all possible transmitted words. 
How many words are there in C’? What is the rate of C? 

(b) Prove that C' is a linear code. What is its dimension? Write down a generator 
matrix for C. 
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(c) Calculate the probabilities of incorrect decoding and of decoding failure using 
this scheme. [Hint: The first four and the last four bits work quite independently, and 
so can be treated separately. In the first four, 

0 or 1 error — correct, 2 errors — failure, 3 or 4 errors — incorrect; 
while in the last four, 

0 errors — correct, 1 or 3 errors — failure, 2 or 4 errors — incorrect. 

Now make a 3 x 3 table; work out the probability of each of the 9 outcomes, and the 
overall result (correct, incorrect, or failure) in each case.] 


Exercise 8.3 Let C’ be the linear code (over Z3) with generator matrix 


a) How many words are there in C? 

b) What is the minimum distance of C? 

c) What is the minimum weight of C? 

d) Find a check matrix for C (a matrix whose null space is C). 
e) Encode 12, and decode 1021, using C. 


( 
( 
( 
( 
( 


Exercise 8.4 A binary code C of length 8 has generator matrix 


eRe 


1 
1 
1 
0 


OOF F 
rFPOrRF 


(a) Find a generator matrix in reduced echelon form. 

(b) What is the minimum weight of C? 

(c) Show that C can correct one error. 

(d) Show that, if two errors occur during transmission, then C can detect this, but 
cannot locate the position of the errors. 

(e) Calculate a check matrix for C. 

(f) Which syndromes correspond to the occurrence of one error? 

(g) Decode the received word 10101101. 


Exercise 8.5 Show that a repetition code with gq = 2 and n odd attains the Hamming 
bound. 


Exercise 8.6 Verify the following table of values of the Hamming upper bound M for 
the maximum size of a binary code of length 10 which can correct up to e errors: 


e 1 2 


3°«A 
M 93 18 5 2 
Prove that, in fact, a 3-error-correcting code of length 10 cannot contain more than 
two codewords. (Thus the Hamming bound is not always met!) 
Construct a linear 1-error-correcting code of length 10 containing 32 codewords. 


(*) Can you find one with 64 codewords? 


Exercise 8.7 Show that, in any binary linear code, the set of codewords of even weight 
is a linear subcode (that is, a subspace of the vector space). 
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Exercise 8.8 (a) Let C be a linear code with check matrix H. Show that C is 
1-error-correcting if and only if no column of H is zero and no column is a multiple of 
another. 

(b) Suppose that the conditions of (a) hold. Verify the following rule for correcting 
one error: 

Calculate the syndrome of the received word w. If it is zero, then w is correct. 
If it is a scalar multiple (say c) of the ith row of H', then subtract c from the ith 
coordinate of w. 


Exercise 8.9 (a) Let F be the finite field with four elements {0,1,w,@}. The 
arithmetic operations in F’ can be deduced from the rules 


2 


1+1=0, ltw=u =H. 


Construct addition and multiplication tables for F’. 
(b) Let C be the linear code over F' with generator matrix 


10011 41 
0103150 @w 
00110 yw 


Find the minimum weight of C’, and a check matrix for C. 
(c) Prove that no code over an alphabet of four symbols with the same length and 
minimum distance as C can contain more codewords than C. 


Exercise 8.10 Prove that, if a perfect 2-error-correcting ternary code of length n 
exists, then 2n? + 1 must be a power of 3. 


Exercise 8.11 What is the dimension of the binary BCH code of designed distance 5 
and length 31? 


Exercise 8.12 A football pools competition requires contestants to predict the 
possible result (home win, away win, or draw) of each of n matches. Show that, in 
order to ensure that all or all but one of the predictions are correct, at least 3"/(2n +1) 


entries are required. Explain why this bound can be met whenever n = (3 — 1)/2 for 
some positive integer d. 


Exercise 8.13 (x) Prove that a qary Hamming code of length n is cyclic 
if (q—1,n)=1. 


Galois Theory 


In Chapter 1 we saw the classical formula for the solution of a quadratic equation. 
Similar formulae for solving cubic and quartic equations were discovered by 
Tartaglia and Ferrari, and publicised by Cardano, in the Renaissance. Math- 
ematicians searched unsuccessfully for such formulae for equations of higher 
degree. The fact that no such formulae can exist is just one detail in Galois 
Theory, which we now outline. 
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Galois himself, who has a good claim to be regarded as the founder of modern 
algebra, was killed in a duel in 1832 at the age of 21. Perhaps he would have 
performed better in the duel but for the fact that he had spent the night before in 
frantic activity, writing an account of all of his discoveries to his friend Chevalier. 
The letter was not published for 15 years, and Galois’ work only found its rightful 
place in mathematics in the second half of the nineteenth century. 


8.6 Normality and separability. We are concerned here only with finite 
extensions of fields. Recall that the field K containing F' is a finite extension 
of F if the dimension of K as F’-vector space (forgetting mutiplication in K’) is 
finite; this dimension is called the degree of the extension, written [K : F}. 
Recall also that a splitting field of a polynomial f(x) over F is a field 
generated over F' by the roots of f. It is a finite extension with degree at most nl, 
where n is the degree of f. Galois theory is concerned with finding all the roots of 
a polynomial, so splitting fields are important. Our first job is to recognise them. 


Definition Let Kk be an extension of F’. We say that K is a normal extension 
of F’ if, whenever f is an irreducible polynomial in F'[z] which has one root in 
K, then all the roots of f lie in K (that is, f splits into linear factors in K[2}). 


‘Normal’ is a much overused word in mathematics. But at this point, there 
should be not too much risk of confusion between normal extensions of fields and 
normal subgroups of groups. 

Normality is a property of extensions rather than individual fields. We often 
say ‘The extension K’/F is normal’, although we have not defined an actual object 
K/F. We read K/F as ‘K over EF”. 


For example, C is a normal extension of R, or of Q. However, if a is the real 
cube root of 2, then K = Q(q) is not a normal extension of Q. For the polynomial 
x? — 2 is irreducible over Q, by Eisenstein’s criterion; it has a root a € K; 
but it does not contain the other two roots of x? — 2, since they are non-real, 
whereas K is contained in the real numbers. In fact, we have the factorisation 
x? —2 = (x-a)(a? + ax +7) in K[z], where the second factor is irreducible 
in K[z]. 


Theorem 8.18 Let K be a finite extension of F. Then K/F is a normal 
extension if and only if K is the splitting field of a poynomial in F(a]. 


Proof Suppose first that K/F is a normal extension. Since it is finite, K is 
generated over F' by finitely many elements a1,...,@,. Let f;(x) be the minimal 
polynomial of a; over F’. Then f;(x) is an irreducible polynomial in F'[a] with a 
root in K, so (by normality) it has all its roots in K. 

Consider the polynomial g(x) = fi(x)--- fn(a). We know that g splits in K. 
But g cannot split over any smaller field, since its splitting field contains all of 
@1,.-.,@n, and these elements generate K. So K is the spliting field of g. 

Conversely, suppose that K is the splitting field of a polynomial g(x) over 
F.. Arguing for a contradiction, suppose that f(x) is an irreducible polynomial 
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in F[z] which has a root a € K and another root 6 in an extension of K but 
not in K. Now K is obviously the splitting field of g(x) over F(a). Also, the 
splitting field of g(x) over F({) is K(() (since it is generated over F'(3) by the 
roots of g). 

Now F(a) © F'(), since a and @ are roots of the same irreducible polynomial 
over F’. Moreover, there is an F-isomorphism from F(a) to F'(3) (see Section 7.16). 
By the uniqueness result for splitting fields, this F-isomorphism can be extended 
to an F-isomorphism between the splitting fields K and K() of g(a) over F(a) 
and F() respectively. So [K(@) : F] = [Kk : F]. Since K C K({), this implies 
that [K(@) : K] = 1, so @ € K, contrary to assumption. The contradiction shows 
that our assumption is untenable, and A/F is normal. 


There is another potential difficulty, concerned with repeated roots of irre- 
ducible polynomials. As we saw in Section 6.6, this cannot occur over a perfect 
field (this includes all fields of characteristic zero, and all finite fields). The fix 
is to define away the difficulty; it is somewhat technical, and you may want to 
skip the next few paragraphs. 

Recall that an irreducible polynomial is called separable if its roots in a 
splitting field are all distinct. We now extend this definition. An arbitrary poly- 
nomial f over F is called separable if all its irreducible factors are separable. 
[Note: We do not require that all the roots of f are distinct. We do not mind if it 
has a repeated irreducible factor.] We say that an extension K’/F is separable 
if, for every element a € K, the minimal polynomial of a over F' (which we know 
is irreducible) is separable. 

How do these two definitions of separability relate to each other? 


Theorem 8.19 A finite normal extension K/F is separable if and only if K 
is the splitting field of a separable polynomial over K. 


Proof Examine the proof of the preceding theorem. If K/F is normal, then K 
is the splitting field over F' of a polynomial g which is constructed as the product 
of the minimal polynomials of some elements of K. If K/F is separable, then all 
these minimal polynomials are separable, and hence so is g. 


The converse is more difficult. Before tackling it, we develop some elementary 
properties of separability. 


Proposition 8.20 Leta be separable over F, and let K be a field containing F. 
Then a is separable over K. 


Proof The minimal polynomial of a over K divides its minimal polynomial 
over F’. So, if the latter has no repeated roots, neither does the former. 


Proposition 8.21 Let F be a field of characteristic p 4 0, and let a be algebraic 
over F. Then a is separable over F if and only if F(a) = F(a?). 


Proof Suppose that F(a?) = F(a). Then a € F(a?), which is a finite extension 
of F; so a = f(a”) for some polynomial f. Now the polynomial g(#) = f(x?) —x 
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has derivative —1, and hence has no repeated roots, and is satisfied by a; so a is 
separable over F’. 

Conversely, suppose that a is separable over F’. Then it is separable over F'(b), 
where b = a”. But a satisfies the polynomial x? — b = x? — a? = (x — a)? over 
F'(b). Since it is separable, its minimal polynomial over F'(b) must be (a — a); so 
a € F(b), and F(a) = F(b). 


Theorem 8.22 Let Kk be an extension of F. Then the set of elements of kK 
which are separable over F is a subfield of kK containing F. 


Proof We must show that, if a and 6 are separable over F’, then so are a+), ab, 
and (if a #0) a~!. The argument is given for a + b below. It is almost identical 
for ab, while a~! is an easy exercise. 

Suppose, for a contradiction, that a and 6b are separable but a+ 6 is not. 
Then F has non-zero characteristic, say p. By Proposition 8.20, we can enlarge F 
without changing the fact that a and 6 are separable. Since F(a+b) 4 F((a+6)?), 
the element a+ b is inseparable over F((a+)”). So, replacing F by F((a+b)?) 
if necessary, we may assume that (a+b)? € F. 

With this assumption, we have 


Pla Bar) Eb) = Bb), 


the first and third equalities holding by Proposition 8.21, since a and 6 are 
separable over F’. Hence a+b € F(a). 

We show next that c? € F for any c € F(a+ b). Any such element is a 
polynomial in a+b, say c = }>d;(a+b)’, with d; € F; then c? = )> d?(a+b)”? € F 
as claimed. 

Now let m = [F(a) : F(a +), and let g(x) be the minimal polynomial of a 
over F'(a +), an irreducible polynomial over F(a+b). Let g(x) = }> cjx7. Then 
g(x)? = > cea)? is a polynomial of degree mp over F, and g(a)? = 0. Since 
(F(a) : F] = mp, the minimal polynomial of a over F' has degree mp, and so it 
must be g(x)”. But this polynomial has at most m distinct roots, contradicting 
the fact that a is separable over F’. 

This contradiction shows that a+ b is separable. 


Now we complete the proof of the theorem. Suppose that K is the splitting 
field of a separable polynomial g(x) over F. Let L be the field consisting of all 
elements of K separable over F’. Then L contains the roots of g by assumption; 
soL=K. 

The theory that Galois developed applies to all finite normal separable 
extensions. So we make a definition: 


Definition Let K/F be a field extension. We say that K/F is a Galois 
extension if it is finite, normal, and separable. 


8.7 The main theorem. Galois Theory relates field extensions to groups. 
The groups arise in the following way: 
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Definition Let K/F be a Galois extension of fields. The Galois group of the 
extension, written Gal(A/F), is the group of all F-automorphisms of K; that is, 
all isomorphisms from K to itself which leave every element of F' fixed. 


Although the modern definition of a group was devised much later than 
the time of Galois, he understood what is meant by the statement ‘the F- 
automorphisms of K form a group’. This group carries in its structure a lot 
of detailed information about the field extension. The key is the following piece 
of numerical information: 


Theorem 8.23 Let K/F be a Galois extension with Galois group G. Then 
|G| = [K: F). 


Proof We use induction on [K : F]; the result is clearly true when K = F’. So 
we assume the result for extensions of smaller degree than n = [K : F). 

Choose an element a € K \ F, and let its minimal polynomial f(x) have 
degree m. Then [F'(a) : F'] = m, and so [K : F(a)] = n/m Also, K is a Galois 
extension of F(a). So the group of F(a)-automorphisms of K has order n/m. 

Now G acts on the set 2 of roots of f(x) in K, since the coefficients of this 
polynomial lie in F’. The stabiliser of a fixes every element of F(a), and so is 
the Galois group of K over F(a). By induction, |Ga| = n/m. 

Also, the roots of f are all distinct (by separability), and lie in K (by 
normality); so |Q| = m. Moreover, Proposition 7.42 implies that G acts tran- 
sitively on 2. (We can regard K as the splitting field of a polynomial having f 
as one of its factors: now, if a,@ € 0, then the F-isomorphism carrying a to 3 
extends to an F-automorphism of K.) 

By the Orbit-Stabiliser Theorem 7.2, we have 


|G] = |Q]- [Gal = m- (n/m) =n, 


and we are done. 


The most important facts about the connection between a field extension 
K/F and its Galois group G is phrased in terms of subgroups H of G and 
subfields LZ intermediate between F and K. If L is such a subfield, then K is a 
Galois extension of L, and so the Galois group Gal(K/L) is a subgroup of G. 
(It consists of those automorphisms of K which fix, not only all of F, but all 
of L.) In the other direction, let H be a subgroup of G. Let Fix(H) be the set 
of elements of K which are fixed by all the automorphisms in H. Then Fix(#) 
is a subfield of K: for, if an automorphism fixes two elements a and b, then it 
also fixes their sum, difference, product, and quotient (if b 4 0). Also, Fix(H) 
contains F’, since all elements of G are by definition F-automorphisms. 

Now we can state the Main Theorem: 
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Theorem 8.24 (Fundamental Theorem of Galois Theory) Let K/F be 
a Galois extension with Galois group G. Then the maps 


L+>Gal(K/L), 
H +> Fix(H), 


are mutually inverse bijections between the set of subfields of K containing F 
and the set of subgroups of G. Moreover, we have 


(a) |K : L] = |Gal(K/L)|, [L: F] = |G: Gal(K/L)|; 

(b) if Ly and Ly are intermediate fields, then Li C Lg if and only if 
Gal( K/L2) Cc Gal( K/L); 

(c) L/F is a normal extension if and only if Gal(K/L) is a normal subgroup 
of Gal(K/F); 

(d) if the equivalent conditions of (c) hold, then Gal(L/F) is isomorphic to 
Gal(K/F)/Gal(K/L). 


Remark Correspondences with property (b) (that is, correspondences between 
ordered sets which reverse the order) have become known as Galois correspon- 
dences. The most important example is the one described in this theorem. Note 
also the double occurrence of the word ‘normal’ in part (c). We see that the use 
of the same term to describe the two very different concepts in group theory and 
field theory is not accidental! 


We need the following lemma. Its converse is also true, as we will see later. 


Lemma 8.25 Let K/F be a finite extension. Suppose that there are only finitely 
many fields L intermediate between F and K (that is, satisfying F C LC K). 
Then K = F(a) for someae K. 


Proof First, if F is a finite field, then so is K. Thus the hypothesis is certainly 
true. We showed in Proposition 7.45 that the multiplicative group of K is cyclic. 
If a is a generator, then obviously K = F(a), and so the conclusion is also true. 
So we may suppose that F is infinite. 

Suppose that A/F has only finitely many intermediate fields. Then for any 
a,b € K, we claim that there exists t € F' such that F(a,b) = F(a+ tb). For the 
infinitely many intermediate fields F(a+ sb), for s € F, cannot all be distinct; 
so there exist 51,82 € F with F(a+s1b) = F(a+s2b) = L, say. Then L contains 
((a+81b)—(a+52b)) /(s1—s2) = b, and L contains (s2(a+s1b)—s1(a+s2b))/(s2— 
$1) =a. So L = F(a,b) as required. 

Now K/F is finite, so K = F(a,,...,a;) for some ay,...,a; € K. Inductively 
use the above observation to replace these t generators by a single one. 


Now we return to the proof of the Fundamental Theorem. 


Proof For any intermediate field L, K is a Galois extension of L, and so 
|Gal(K/L)| = [K : L]. But we have L C Fix(Gal(K/L)) = L’, say; and thus 
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Gal( K/L) = Gal(K/L’), and | Gal(K/L)| = [K : L’). It follows that [L’ : L] = 1, 
so that L = L’. Thus, Fix(Gal(K/L)) = L. 

This also shows that there are only finitely many fields L between F' and K. 
By Lemma 8.25, kK = F(a) for some a € K. 

Now let H be any subgroup of G. Then K is a Galois extension of Fix(#), 
with Galois group H’ (say), where H < H’. If we can prove that equality holds, 
then Gal(Fix(H)) = H, and we have shown that Gal and Fix are mutually 
inverse bijections. This follows from the next fact, with E = Fix(#H): 


Let K/E be a Galois extension with Galois group H. If H’ < H, 
then Fix(H’) > E. 


Suppose, for a contradiction, that Fix(H’) = E. Let K = E(a) have degree 
n over FE, and let a = aj,...,@, be the roots of the minimal polynomial of 
a over E. Now H permutes aj,...,@, transitively, and hence regularly. So the 
proper subgroup H’ cannot act transitively on this set. Let {a,,...,a,} be an 
orbit. Then the coefficients of the monic polynomial with roots a,,...,a,, being 
elementary symmetric functions of these quantities, are all fixed by H’, and so 
lie in &. Then a = a, satisfies a polynomial of degree r over FE, contradicting 
the fact that [E(a) : E] =n. 


Parts (a) and (b) of the Fundamental Theorem are now clear. 

We turn to part (c). For any g € G = Gal(K/F), the conjugate Gal(K/L)9 
fixes elementwise the field Lg = {ag : a € L}. So Gal(K/L) is a normal subgroup 
of G = Gal(K/F) if and only if Lg = L for all g € G. But, if f is an irreducible 
polynomial over F’ with a root a € L, then every root of f is the image of a 
under an element of G; so the condition Lg = L for all g € G is equivalent to 
the condition that LZ contains all the roots of any polynomial such as f; that is, 
that L is a normal extension of F’. Thus (c) holds. 

Finally, suppose that these conditions hold for L. Then any element of G 
fixes [ as a set; so we have a homomorphism from G = Gal(K/F) to Gal(L/F) 
given by restricting elements of G to DL. The kernel of this homomorphism is 
just the subgroup of G fixing L elementwise, that is, Gal(K/L); and, since every 
F-automorphism of L extends to an F-automorphism of kK by Proposition 7.42, 
the image of the homomorphism is Gal(L/F'). Thus (d) follows from the First 
Isomorphism Theorem. 


8.8 Solubility by radicals. We now come to the most famous application 
of the theory developed by Galois: a criterion for the solubility of equations 
by radicals. Throughout this section, we make the assumption that our fields 
have characteristic zero. (The theory needs only minor changes in non-zero 
characteristic: see Exercise 8.14.) 

First, what is meant by solving an equation by radicals? Consider the familiar 
formula for the solutions of the quadratic equation ax? + br + c = 0; namely, 


on Oe Vb? — 4ac 


L£= 
2a 
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The solutions are obtained from the coefficients of the polynomial by applying 
field operations and extracting the square root of an element b? — 4ac: this final 
step may require a field extension. For a more complicated example, consider 
the equation «+ — 4x? + 2 = 0, with solutions x = +\/2+ V2. The solutions 
lie in a field which is obtained from Q first by adjoining 2, and then adjoining 


V2+4+ v2. (Since 
2 — Vi = v3/y/24 v3, 


the resulting field contains all the solutions.) 
We make the following definitions: 


(a) A field extension K/F is a simple radical extension if K = F(a) for 
some a € K with a” € F, |K : F] =n, and F contains a primitive nth root 
of unity. 

(b) A field extension K/F is a radical extension if there are fields F = 
fo CF, c...F, =K such that F;/Fj;_1 is a simple radical extension for 
aa REY 

(c) The polynomial f € F[z] is soluble by radicals over F if its splitting 
field is contained in a radical extension of F’. 


Remark The condition about roots of unity in the definition of a simple radical 
extension is vacuous if n = 2, since the square roots of unity are +1, both in F. 
In general, it is possible to do without this condition, but we start with it (it 
makes the arguments much simpler) and then work around it at the end. 


Recall the definition of a soluble group in Section 7.5: the finite group G is 
soluble if there is a series 


G=Go>G,>...>G,=1 


of subgroups of G, where G; < Gj_1 and Gj_1/G; = Gal(F;/F;_1) is cyclic for 
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Theorem 8.26 Let f be a polynomial over a field F of characteristic zero, with 
Galois group G. Suppose that F' contains a primitive |G|th root of unity. Then 
f is soluble by radicals over F if and only if G is a soluble group. 


We begin the proof with a special case. 


Lemma 8.27 Let K/F be an extension of degree n, and assume that F contains 
a primitive nth root of unity. Then K/F is a simple radical extension if and only 
if it is normal with cyclic Galois group. 


Proof Suppose first that K/F is a simple radical extension, say K = F(a), 
where a” = c € F. Let w be a primitive nth root of unity. Then the roots of 


the polynomial x” — c are a,aw,...,aw"—!; so K is the splitting field of this 
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polynomial, and is normal. Now we have K = F(a) = F(aw") for any i; so there 
is a unique element o; of the Galois group G = Gal(K/F) mapping a to aw”. 
Then G = {09,01,---;0n—1}. Now o} maps a to aw"; so of = oj, and G is cyclic 
(generated by 01). 


Conversely, suppose that K/F is normal, with cyclic Galois group G = (0). 
We have to find an element a € K such that a” € F and K = F(a). It suffices 
to find a € K with a £0 such that a? = wa, where w is a primitive nth root of 
unity. For then a"¢ = w"a" = a", so b = a” € Fix(c) = F; and the images of a 
under powers of ¢ (the nth roots of b) are all the roots of the minimal polynomial 
of a, so [F(a): F] =n. 

We find such an a by a trick. According to Artin’s Lemma 8.28 (see below), 
the n endomorphisms 1, ¢,07,...,0”~1 are linearly independent over K. So there 
exists « € K such that 


a@=a+ae/w+ 207 fw? +-:-+ a0? fw") £0. 


(If this expression were zero for all x, then the linear combination 1 + a/w + 
--»+0"~1/wW"-! would be the zero map.) Then 


ao =a204+n207/wt-:-+2/w™ = aw, 


as required. 


Proposition 8.28 (Artin’s Lemma) Let o1,...,0, be distinct automor- 
phisms of a field K. Then o4,...,0n are linearly independent over K (in the 
sense that, if 


3 


a;(xo;) = 0 


i=l 


for alla € K, where ay,...,a, € K, thenay =...=a, =0. 


Proof Suppose that we have such a relation, with not all the coefficients equal 
to zero. Suppose that the number m of non-zero coefficients is as small as possible. 
Clearly, m > 1. We derive a contradiction by producing another relation with 
fewer non-zero coefficients. 

Assume that a; 4 0 and ag 4 0. Since the automorphisms are distinct, there 
exists an element y € K with yo. # yoo. Now (xy)o; = (x0;)(yo:) for each i. 
Applying the dependence relation to xy, we obtain 


a1(x01)(you) + a2(xo2)(yoo) +--+: =0 
for all « € K. Multiplying the original equation by yo: and subtracting gives 


a2(yo2 — yoi)@o_g ++: =0. 
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This relation has fewer non-zero terms than the original one, but is not identically 
zero (since ag # 0 and yo: # yoo). So we have the required contradiction. 


Now we prove the theorem. 
Suppose first that f is solvable by radicals, and let 


F=FocChc...ck=K, 


where F;/F;-1 is a simple radical extension of degree n; for i = 1,...,7r, and 
f splits in k.We may assume that K is a normal extension of F’; let G be its 
Galois group. Let G; = Gal(K/F;) for i =0,...,r. Then, by the FTGT, 


G=Go>G,>...>G,=1, 


where G; 4d Gj_1; and G;_1/G; = Gal(F;/F;_1) is cyclic (of order n;) for i = 
1,...,r, by Lemma 8.27. Thus, by definition, the group G is soluble. Now, if L is 
the splitting field of f, then Gal(L/F) is a homomorphic image of Gal(K/F) = G 
(again by the FTGT), and so is soluble. 

Conversely, suppose that L is the splitting field of f, and that G = Gal(L/F) 
is soluble. Then, by definition, there are subgroups 


G=Go>G,>...>G,=1, 


where G;<G;_1 and G_1/G; is cyclic (of order n;, say) for i =1,...,7r. Letting 
F; = Fix(G;), we see by the FTGT that [F; : F;_4| = Nj and Gal(F;;/F;—1) is 
cyclic of order n;. By Lemma 8.27, F;/F;_1 is a simple radical extension; so K/F 
is a radical extension, and f is soluble by radicals over F’. 


Finally, we show that the assumption we used above, that the field F contains 
appropriate roots of unity, is unnecessary. We do this by showing that a root of 
unity lies in a radical extension. 


Proposition 8.29 Let w be a primitive nth root of unity over F. Then: 


(a) F(w)/F is a normal extension with abelian Galois group; 
(b) w is contained in a radical extension of F. 


Proof (a) F(w) is the splitting field of 2” — 1 over F’, since the roots of this 
polynomial are the powers of w. Let G be its Galois group. Any element o of G 
maps w to some power w”, and is uniquely determined by r; call this element o,. 
Now o,a, and 0,0, both map w to w"*; so they are equal. Thus G is abelian. 


(b) Now the order of G is at most the number ¢(n) of primitive nth roots 
of unity, and hence less than n. Arguing by induction, the |G|th roots of unity 
lie in a radical extension F of F. By Theorem 8.26, since an abelian group is 
soluble, E(w) is a radical extension of EF, and hence of F’,, as required. 


Let us see how these theoretical results lead to a formula for the solution of 
a cubic or quartic equation. For this, we need a version of Newton’s Theorem on 
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symmetric functions. A symmetric function s(x1,...,2,) is a polynomial in 
%1,...,%,, Which is unchanged by any permutation of its arguments. 


Theorem 8.30 (Newton’s Theorem) Let f be a polynomial of degree n. 
Then any symmetric function of the roots of f can be expressed as a polynomial 
in the coefficients of f. 


Remark If f(z) = 2” +a,r"~'+---+a,, then (—1)'a; is the ith elementary 
symmetric function of the roots of f; that is, the sum of all products of the 
roots 7 at a time. Newton’s Theorem asserts that any symmetric function can be 
expressed as a polynomial in the elementary symmetric functions. 


The cubic Let f be a polynomial of degree 3, with roots a, 3,7. 
Since $3 has a normal subgroup of index 2 which is cyclic of order 3, we should 
look for a function of the roots which has cyclic symmetry. Such a function is 


A= (a= 6)(8= Wy a): 


Now D = A? is a symmetric function of the roots, and so can be calculated in 
terms of the coefficients; then A is found by extracting a square root. 

Now the analysis of cyclic extensions tells us to look at the quantity a+w@+ 
wy = p. Since p* has cyclic symmetry in the roots, it can be expressed in terms 
of A and the coefficients. Extracting the cube root gives p. We now have enough 
information to determine the roots. 


The quartic Let f be a polynomial of degree 4, with roots a, 3,7, 0. 

This time, the key observation is that $4, has a normal Klein subgroup (con- 
sisting of the identity, (a 3)(y 5), (a 7)(G 6), and (a 5)(G y). We look for three 
functions of the roots reflecting this symmetry: we can take 


E =a3 + 6, 
n =ay + (0, 
¢ =ad + BY. 


Now the three elementary symmetric functions € +7+¢, €n4+7¢4+ ¢€, and En¢, 
are symmetric functions of a, 3,7,6, and so can be calculated. Then €,7,¢ are 
the roots of a cubic with known coefficients, and we already know how to solve 
this. Finally, knowing these three quantities, it is easy to find a,...,6 by solving 
quadratics. 

There are various tricks which can be used to streamline these methods. For 
the cubic, see Exercise 8.16. 

Finally, we can prove that quintics are not soluble by radicals. Of course, it 
suffices to write down a single quintic over Q whose splitting field is not a radical 
extension of Q. We show that the quintic f(x) = 2° — 6r + 3 has this property. 

First, we observe that f is irreducible. This is immediate from Eisenstein’s 
criterion (using the prime 3). So the Galois group G of the polynomial is a 
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transitive subgroup of the symmetric group Ss5, acting on the five roots of f. 
By the Orbit-Stabiliser Theorem, |G| is divisible by 5. (This also follows by 
observing that, if a is a root of f, then the subfield Q(a) of the splitting field 
has degree 5 over Q.) By Sylow’s (or Cauchy’s) Theorem, G contains an element 
of order 5, which must be a 5-cycle. 

Next, we use some elementary calculus to show that f has exactly three real 
roots. For the stationary points of f occur when 524—6 = 0; that is, e = + 4/6/5. 
Since there are only two stationary points, f can have at most three real roots. 
But f(—2) = —17, f(—1) = 8, f(1) = —2, and f(2) = 23; by the Intermediate 
Value Theorem, there is a root in each of the intervals (—2,—1), (—1,1), and 
(1, 2). 

Now complex conjugation is an automorphism of the splitting field of f, which 
fixes the three real roots and interchanges the two non-real ones; so it acts as a 
transposition. 

It can be shown (see Exercise 8.17) that a subgroup of Ss which contains 
both a 5-cycle and a transposition must be the whole of S. So the Galois group 
of f is Ss. Now Ss is insoluble (since it contains the non-abelian simple group 
As as a normal subgroup); so f is not soluble by radicals. 


8.9 Ruler-and-compass revisited. We now return to the subject of ruler- 
and-compass constructions. We proved in Section 6.8 a sufficient condition for 
a point p to be constructible with ruler and compass from a set S' of points: its 
coordinates should lie in an extension of Q(S') with degree a power of 2. (Recall 
that Q(S) is the field generated over Q by the coordinates of the points of S; 
we assume that (0,0),(1,0) € S.) This enabled us to prove the impossibility of 
certain classical construction problems. But the only tool we have so far to show 
that a construction is possible is to give an explicit algorithm for doing it. It is 
possible to improve this, by giving a necessary and sufficient condition. 


Theorem 8.31 Let S be a set of points in the Euclidean plane, containing 
(0,0) and (1,0). A point p can be constructed from S with ruler and compass 
if and only if its coordinates lie in a normal extension of Q(S) with degree a 
power of 2. 


The proof uses a property of 2-groups, observed in Section 7.3: 

Lemma 8.32 Let G be a group of order 2". Then G has a chain 
G=Go>G,>...>G,=1 

of subgroups with the property that |G;_1:G;| =2 fori =1,...,r. 


Note that it follows that G;<Gj;_ 1 for all i, since a subgroup of index 2 is 
normal. In particular, G is soluble. 


Proof By Lemma 8.32, there is a non-identity element g in Z(G); we may 
assume that g has order 2. Then {1, g} is a normal subgroup of G. We set G,_1 = 
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{1,g}. Now G/G,_1 has order 2”~'. By induction, it has a chain of subgroups, 
each of index 2 in the preceding. By the Second Isomorphism Theorem, these 
subgroups have the form G;/G,_1, where the G; are subgroups of G forming a 
chain with the required properties. 


Proof of the theorem Suppose first that p is constructible from S. Let 
Fo = QS), and let F; be the field generated by the coordinates of all points 
constructed at step 7 of the construction. We showed in Section 6.8 that 
[F; : F;-1] < 2. So the final extension has degree a power of 2, although it 
may not be normal. To rectify this, let f;(a) be the product of the minimal poly- 
nomials of all points constructed at the ith stage, over Fo, and K; its splitting 
field. Then Ko = Fo, and K; is obtained from K;_; by adjoining the solutions 
to a finite number of quadratics. Hence we may interpolate a finite number of 
intermediate fields between K;_; and K;, each of degree 2 over its predecessor. 
The final field K,. is normal over Fo and contains the coordinates of p. 

For example, suppose that we want to construct a = 2, Then Fy = Q, 
F, = Q(v2), Fo = Q(a). Now F) is not a normal extension of Fy. But the 
minimal polynomial of a over Q is a4 — 2 = (a? — V2)(x? + V2). So K, = Fh, 
and Ky is obtained by adjoining square roots of 2 and —\/2. So we have the 
chain 


Fo=QC KH =AUv2) Cc L=Q V2) c Ky = A(V2, i). 


To prove the converse, we note that any quadratic equation with positive 
discriminant can be solved with ruler and compass, by intersecting a line with 
a circle. (We may assume that the quadratic is 7? + ba + c = 0, where b? > 4c. 
Now draw the circle with centre (—b/2,0) and radius Vb? — 4c/2: it has equation 
x? +br+c+y? =0, so it intersects the z-axis in the required points.) 

So, if the coordinates of p lie in an extension of Q(S) with a chain of inter- 
mediate fields as in the theorem, then they can be found by successively solving 
quadratics, and hence constructed with ruler and compass. 


Sometimes an alternative form of the theorem is convenient, where we rep- 
resent points in the Euclidean plane by complex numbers: given a point p with 
coordinates (x,y), we let c(p) be the complex number «x + iy. For a finite set S 
of points, let c(S) = {c(p) : p € S}. Now the following theorem can be shown: 


Theorem 8.33 Let S be a set of points in the Euclidean plane, containing 
(0,0) and (1,0). A point p can be constructed from S with ruler and compass if 
and only if c(p) lies in a normal extension of Q(c(S)) with degree a power of 2. 


Hint We have Q(c(p)) € Q(p)(i). 


The Greeks knew how to construct regular polygons of various numbers 
of sides (for example, pentagons and hexagons), but were unable to construct 
various others (for example, heptagons). Using Theorem 8.33, it is possible to 
give an exact characterisation of the constructible regular polygons. We need the 
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concept of a Fermat prime, a prime number of the form F,, = 2?” +1 for some 
integer n. The first few Fermat primes are Fo = 3, Fi = 5, Fh = 17, F3 = 257, 
and Fy = 65537. Surprisingly, no further Fermat primes are known, though sev- 
eral further numbers of this form are known to be composite. (For example, 
Euler observed that 641 = 54 + 24 divides 54228 + 2°?, and also 641 =5-27+41 
divides 5* - 278 — 1, so 641 divides the difference F;.) 


Theorem 8.34 A regular n-gon is constructible with ruler and compass if and 
only if n is the product of a power of 2 and a number of distinct Fermat primes. 


In particular, a regular 7-gon is not constructible, but a regular 17-gon is. 
This last fact was first observed by Gauss; he was so pleased with it that he 
made his career in mathematics, and asked for a regular 17-gon to be inscribed 
on his tombstone. This was the first significant advance in ‘ruler-and-compass 
geometry’ since the time of Euclid. 


Proof I will not give the argument in detail, but outline the steps. 

(a) By Theorem 8.33, a regular n-gon is constructible by ruler and compass if 
and only if [Q(e?7'/”) : Q] is a power of 2. 

(b) [Q(e?7!/") : Q| = o(n) (Euler’s function). In fact, the minimal polynomial 
of e27'/" over Q is the nth cyclotomic polynomial ©,,(x), whose roots 
are all the primitive nth roots of 1, and whose degree is ¢(n); and it can 
be shown that ®,,(x) is irreducible over Q. 

(c) If, and nz are coprime, then ¢(n1n2) = o(n1) (nz); and ¢(p*) = p 
1) if p is prime. 


a= 


l(p— 


Combining these steps, we see that the regular n-gon is constructible if and 
only if each prime power factor p* of n has the property that p*~1(p — 1) is a 
power of 2. This requires that either p = 2, or a= 1 and p—1 is a power of 2. 
Now if a number p = 2” + 1 is prime, then necessarily n is itself a power of 2, 
and p is a Fermat prime. This completes the proof. 


8.10 The Theorem of the primitive element. One curious corollary of 
the Fundamental Theorem of Galois Theory is the following test for when a field 
extension can be generated by one element. We saw one half of this theorem 
earlier, in Lemma 8.25. 


Theorem 8.35 (The Theorem of the Primitive Element) Let K/F be a 
finite extension. Then the following are equivalent: 


(a) K = F(a) for someae K; 
(b) there are only finitely many fields L intermediate between F and K (that 
is, satisfying F CLC K). 


Proof We showed in Lemma 8.25 that (b) implies (a). For the converse, let 
K = F(a) and let f be the minimal polynomial of a over F’. For any intermediate 
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field L, let fr be the minimal polynomial of f over L. Then fy, divides f. There 
are only a finite number of monic polynomials dividing f in K [a]. If we show that 
the polynomial f; determines L, then it follows that the number of intermediate 
fields is finite. 

Let L’ be the field generated over F' by the coefficients of f,. Then L’ C L. 
Now deg(fr) = [L(a) : L] = [K : L]. Also, fr is irreducible over L, and hence 
over its subfield L’; so [Kk : L'| = deg(fz). It follows that [Z : L’] = 1, and 
so L = L’. 


Proposition 8.36 If K/F is a finite separable extension (in particular, if it is 
a Galois extension), then K = F(a) for someae K. 


Proof If K/F is a Galois extension, then the number of intermediate fields is 
equal to the number of subgroups of the Galois group, by the FTGT, and hence 
finite; so the result follows from the Theorem of the Primitive Element. 

In general, let K = F(aj,...,a4), and let f;(a) be the minimal polynomial 
of a; over F’. By assumption, each f; is separable, and hence f = fi --- ft is 
separable. Thus, if LZ is a splitting field for f over F’, then L/F is a Galois 
extension, so has only finitely many intermediate fields (as above). A fortiori, 
the same is true for K/F; now argue as before. 


8.11 Appendix: The Fundamental Theorem of Algebra. The 
Fundamental Theorem of Algebra asserts that any non-constant polynomial over 
the complex numbers has a root in C: in other words, C is algebraically closed. 

As we saw in Section 6.5, this is not a theorem of algebra: its proof 
requires some analysis. A proof using Liouville’s Theorem from complex analysis 
was given there. It is possible to reduce, but not eliminate, the required anal- 
ysis, as is done here, in an application of Galois theory. First we list the two 
analytic facts that we require. Both are readily proved using the Intermediate 
Value Theorem. 


Proposition 8.37 (a) Any polynomial of odd degree over R has a root in R. 
(b) Any positive real number has a real square root. 


Sketch proof (a) Let f = a," +--- with n odd and a, £ 0. If a, > 0, 
then f(x) is positive for large positive x, and negative for large negative x; by 
the Intermediate Value Theorem, f(x) = 0 for some value of x. The argument is 
similar if a, < 0. 

(b) If a > 0, the function x? —a is positive for large positive x, and is negative 
for x = 0. 


Corollary 8.38 Any complex number has a complex square root. 


Proof Let z= 2+ iy with x,y € R. Check that, if a? = (,/x? + y? + x)/2 and 


b? = (\/x2 + y? —x)/2, and the signs of the square roots are appropriately chosen, 
then (a + ib)? = z. (The square roots exist by Proposition 8.37(b).) 
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We also list the group theory that we need: 


(a) Sylow’s Theorem (Theorem 7.5). 
(b) If |G| is a power of 2, greater than 1, then G has a subgroup of index 2. 
(This follows from the solubility of 2-groups.) 


Now we prove the theorem. Indeed, we prove the stronger assertion that R 
has no finite extension of degree greater than 1 except for C. So let F' be a finite 
extension of R, of degree n. By enlarging F' if necessary, we may assume that 
F/R is a normal extension. Let G be its Galois group. 

We claim first that |G| is a power of 2. For let P be a Sylow 2-subgroup 
of G, and L the fixed field of P. Then [ZL : R}] = |G: P| is odd. But, by 
Proposition 8.37(a), R has no non-trivial extension of odd degree. So P= G. 

Let H be a subgroup of G of index 2, and M its fixed field. Then M is 
a quadratic extension of R, so of the form R(a), where a? € R. By Proposi- 
tion 8.37(b), a is negative, and —a has a real square root c. So M = R(ci) = C. 
The Galois group of F' over C is H. 

If |H| > 2, let kK be a subgroup of index 2 in H, and N its fixed field. But 
then N is a quadratic extension of C, contradicting Corollary 8.38. So H = 1 
and F = C as required. 


Exercise 8.14 (a) Let F be a field of characteristic p. Show that any separable 
extension of F' of degree p has the form F(a), where a? = a+ a for some a € F. 

(b) Let F be a field of characteristic p, and K a Galois extension of F’. Prove that 
there is a chain of subfields F = Fo C Fi C...C Fm = K where, for i= 1,...,m, we 
have F; = Fy-1 (ai), [Fi : Fi-1] = ni, and either a7 € Fy_-1 and p J ni, or a? —a € Fy-1 
and n; = p. 


Exercise 8.15 (a) Calculate the Galois group of the polynomial x — 2 over Q. If K 
is its splitting field, find all the subfields of K. 

(b) Show that the Galois group of the polynomial x* + x? — 2x — 1 over Q is cyclic 
of order 3. 


Exercise 8.16 (a) Show that the cubic x? +ax?+br+c+0 (over a field of characteristic 
zero) can be reduced to one with a = 0 by a substitution of the form y = x +k for 
some & (‘completing the cube’). 

(b) Verify that, if w is a primitive cube root of unity, then 


ety +23 —382yz =(e@tyt2z)(at+wytwz)(2+w?y+wz). 


(c) Show that, if y and z satisfy y? + z? = ¢ and yz = —b/3, then the roots of the 
cubic 2* + br +c =0 are —y — z, —wy — wz, and —w7y — wz. 
(d) Hence solve the general cubic by radicals. 


Exercise 8.17 Show that a subgroup of Ss which contains the 5-cycle (1 23 45) and 
a transposition must contain all transpositions, and hence must be Ss. 
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Exercise 8.18 (a) Let G be a group and H a subgroup. Suppose that 


e the intersection of the conjugates of H is 1; 
e there is a chain 


G=G)>G,\>...>G,=H 


with |G;_1: G;| =2 for i=1,...,r. 


Prove that |G| is a power of 2. 
(b) Call an extension K’/F constructible if there is a chain of fields 


F=hRCKc...cCh=K 


with [F; : Fi-1] = 2 fori =1,...,7r. Prove that, if the characteristic of F is not 2 and 
K/F is constructible, then there is a constructible normal extension L/F with LD K. 


(c) Hence give another proof of Theorem 8.31. 


Further reading 


There is no shortage of good books to choose if you want to read further. What 
follows is only a very partial and personal choice from what is available. 

The material in the first chapter is basic to all of mathematics, and is often 
covered in books on Discrete Mathematics, such as Biggs [2]. (Numbers refer to 
the bibliography below.) 

A very important part of algebra, especially for applications, which has only 
had a brief treatment in this book, is linear algebra. There are very many 
textbooks on linear algebra; I suggest Kaye and Wilson [23] or Blyth and 
Robertson [3,4]. 

There are many general algebra books which go further than this one. Two 
examples are those of Cohn [10] and Lang [24]. 

Two books covering in greater detail some of the topics in this book are 
Stewart [39] on Galois Theory, and Hartley and Hawkes [18] on modules over 
Euclidean domains and their applications. 

Books on general group theory include those of Macdonald [27] and Rose [33], 
or the more encyclopaedic two-volume work by Suzuki [40]. Beyond a certain 
point, the books become more specialised. The most detailed information about 
groups is obtained by studying their representations, either as groups of perm- 
utations (see Cameron [8], Dixon and Mortimer [15]) or as matrix groups (Curtis 
and Reiner [14], Serre [37]). Another way of looking at representations of a group 
G by matrices over a ring R is to study the finitely generated modules for the 
group ring RG: this fits in with the modern philosophy that rings are best studied 
via their module categories. Rotman’s book [34] is an introduction to homological 
algebra. 

There is a wide profusion of books on more specialised parts of group theory. 
To mention just a few, we have Gorenstein’s account [17] of the classification of 
finite simple groups, Johnson [20] on presentations of groups, Wilson [42] on the 
finite simple groups, and Leedham-Green and McKay [25] on p-groups. 

General accounts of ring theory are given by McCoy [29] and Cohn [10], and a 
more thorough coverage (in two volumes) by Rowen [35]. For algebraic geometry, 
long regarded as a fearsome subject for beginners, there are now some very good 
introductions, such as those of Shafarevich [38] and Reid [32]. 

MacLane’s book [28] is a good introduction to categories; Enderton’s 
book [16] on set theory gives you a firm foundation for Section 6.1 (and indeed 
for all of mathematics). For a brief introduction see Cameron [7]. Van Lint [26] 
will guide you further in coding theory, and Kaplansky [22] in Galois theory. 

I have also given here details of books from which I have quoted in the text. 
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Further resources are available on the Web. I have listed a few below. Since 
Web addresses change, it is better to search for these by name. 
GAP website: 
http: //www.gap-system. org 
GAP is a programming system for doing computations with algebraic struct- 
ures, especially groups. The name is an acronym for ‘Groups, Algorithms and 
Programming’. 
Online Atlas of Finite Groups: 
http://brauer.maths.qmul.ac.uk/Atlas/v3/ 
Detailed information about many finite simple groups, especially simple groups 
or those ‘close to’ being simple. 
Encyclopedia of Integer Sequences: 
http://www.research.att.com/~njas/sequences/ 
Here you can look up any interesting numberical sequence, such as the number 
of groups of order n for n = 1,2,.... 


MacTutor History of Mathematics: 

http: //www-groups.dcs.st-and.ac.uk/~history/ 

Here you can read the fascinating stories of the people who created modern 
algebra. 
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